Hacking trampolining CPS


I spent some quality time today trying to hack together a continuation passing style system in Ruby, to clarify some of my thinking. I ended up with something that is more or less a very small interpreter for S expressions, that uses a trampolining CPS interpreter. The language is not in any way complete, such things as assignment isn’t there, there is only one global scope and so on, so the continuations in this system is really not useful for anything except for hacking with it to gain understanding.

As such, I thought people might find it a bit interesting. I wish I’d seen something like this 5 or 10 years ago… Note that this code is extremely hacky and incomplete and bad and whatnot. Be warned. =)

OK, first you need to “gem install sexp”. This provides dead easy parsing of S expressions. Since that wasn’t the main purpose of this code, doing it with a Gem was easier.

The first part of the code we need is the requires, and structures to represent continuations:

require 'rubygems'
require 'sexpressions'

class Cont
  def initialize(k)
    @k = k
  end
end

class BottomCont < Cont
  def initialize(k, &block)
    super(k)
    @f = block
  end

  def resume(v)
    @f.call(v)
  end
end

class IfCont < Cont
  def initialize(k, et, ef, r)
    super(k)
    @et, @ef, @r = et, ef, r
  end

  def resume(v)
    evaluate((v ? @et : @ef), @r, @k)
  end
end

class CallCont < Cont
  def initialize(k, r)
    super(k)
    @r = r
  end

  def resume(v)
    evaluate(v, @r, @k)
  end
end

class ContCont < Cont
  def initialize(k, v, r)
    super(k)
    @r, @v = r, v
  end

  def resume(v)
    evaluate(@v, @r, v)
  end
end

class NextCont < Cont
  def initialize(k, ne, r)
    super(k)
    @ne, @r = ne, r
  end

  def resume(v)
    evaluate(@ne, @r, @k)
  end
end

BottomCont is is what we use to do something at the end of the program. We could print something, or anything else. IfCont is used to implement a conditional. It’s quite easy – once we resume we check the truth value and evaluate the next part based of the result. CallCont will invoke some existing S expressions in a variable. It just takes the value and evaluates that. ContCont is a bit trickier. It will take a value, and then when asked to resume will assume that the parameter to resume is a continuation and invoke that continuation with the value it got earlier. Finally, NextCont is used to implement basic sequencing. It basically just throws away the earlier value and uses the next instead.

The actual code for evaluate and a helper function looks like this:

def evaluate_sexp(sexp)
  cont = BottomCont.new(nil) do |val|
    return val
  end

  env = {
    :haha => proc{|x| puts "calling proc"; 43 },
    :print => proc{|x| puts "printing" },
    :save_cont => proc{|x| puts "saving cont"; env[:saved] = x; true },
    :foo => 42,
    :bar => 33,
    :flux => "(call flux)".parse_sexp.first
  }

  c = evaluate(sexp, env, cont)

  while true
    c = c.call
  end
end

def evaluate(e, r, k)
  if e.is_a?(Array)
    case e.first
    when :if
      evaluate(e[1], r, IfCont.new(k,e[2],e[3],r))
    when :call
      evaluate(e[1], r, CallCont.new(k, r))
    when :continue
      p [:calling, :continue, e[1]]
      evaluate(e[1], r, ContCont.new(k, e[2], r))
    when :prog2
      evaluate(e[1], r, NextCont.new(k, e[2], r))
    end
  else
    case e
    when :true
      proc { k.resume(true) }
    when :nil
      proc { k.resume(nil) }
    when Symbol
      proc {
        if r[e].is_a?(Proc)
          k.resume(r[e].call(k))
        else
          k.resume(r[e])
        end
      }
    else
      proc { k.resume(e) }
    end
  end
end

Here evaluate_sexp is the entry point to the code. We first create a BottomCont that will just return the value. We then create an environment that includes simple values, a function (flux) that calls itself, and some procs that do different things. Finally evaluate is called, and then we repeatedly evaluate the thunk it returns. Since we know that the bottom continuation will return, we can actually invoke this part indefinitely. That is the actual trampolining part, right there.

The evaluate function will check if it’s an array we got, and in that case it will check the first entry and switch based on that, creating IfCont, CallCont, ContCont or NextCont based on the entry. If it’s a primitive value we do something different. As you can see we first check if the value is one of a few special ones, and then if it’s a symbol we look it up in the environment. If the value from the environment is a proc we invoke it with the current continuation, which means the proc can do funky stuff with it. The common thing for all the branches is that they wrap everything they do in a thunk, and inside that thunk call resume on the continuation with the value provided.

Finally we can try it out a bit:

p evaluate_sexp("123".parse_sexp.first) # 123
p evaluate_sexp("bar".parse_sexp.first) # 33
p evaluate_sexp("nil".parse_sexp.first) # nil

p evaluate_sexp("(if quux 13 (if true (if nil 444 555)))".parse_sexp.first) # 555
p evaluate_sexp("(if quux 13 (if true (if nil 444 haha)))".parse_sexp.first)

Here you can see that simple things work as expected.

What about calling the flux function, that will invoke itself?

p evaluate_sexp("(call flux)".parse_sexp.first)

This will actually loop endlessly. In effect, when we add trampolining to a CPS, we in effect get a stack less interpreter, in such a way that we get tail call recursion for free.

Finally, what about the actual continuation stuff? Another way of creating an eternal loop is to do something like this:

p evaluate_sexp("(prog2 save_cont (prog2 print (continue saved 33333)))".parse_sexp.first)

This piece of interesting code will actually loop forever. How? Well, first the prog2 will run the proc in save_cont. This will save the current continuation, and then return true from the proc. Then the next prog2 will be entered, running the print proc. Finally, the final part will be evaluating the continue form, which will take the continuation in saved, invoke that with the value 33333. This will in effect jump back to the first prog2, return 33333 from the call to save_cont and go into the next prog2 again. Looping…

If you use an if statement instead, and return nil from the inner call to the continuation, and add some printing to the IfCont#resume, you can see that that point will only be invoked twice:

p evaluate_sexp("(if save_cont (prog2 print (continue saved nil)) 321)".parse_sexp.first)

This will generate:

[:running, :if, :statement]
printing
[:calling, :continue, :saved]
[:running, :if, :statement]
321

Here it’s obvious that the if statement runs twice, and that the second time the evaluation turns into false, which makes the final continuation return 321

I hope this little excursion into CPS land was interesting for someone. It’s a quite useful technique to know about, once you wrap your head around it.



The Maintenance myth


Update: I’ve used the words “static” and “dynamic” a bit loose with regards to languages and typing in this post. If this is something that upsets you, feel free to read “static” as “Java-like” and “dynamic” as “Ruby-like” in this post. And yes, I know that this is not entirely correct, but just as mangling the language to remove all gender bias makes it highly inconvenient to write, I find it easier to write in this language when the post is aimed at people in these camps.

Being a language geek, I tend to get into lots of discussions about the differences between languages, what’s good and what’s bad. And being a Ruby guy that hangs out in Java crowds, I end up having the static-vs-dynamic conversation way too often. And it’s interesting, the number one question everyone from the static “camp” has, the one thing that worries them the most is maintenance.

The question is basically – not having types at compile time, won’t it be really hard to maintain your system when it grows to a few millions of lines of code? Don’t you need the static type hierarchy to organize your project? Don’t you need an IDE that can use the static information to give you intellisense? All of these questions, and many more, boil down to the same basic idea: that dynamic languages aren’t as maintainable as static ones.

And what’s even more curious, in these kind of discussions I find people in the dynamic camp generally agrees, that yes, maintenance can be a problem. I’ve found myself doing the same thing, because it’s such a well established fact that maintenance suffers in a dynamic system. Or wait… Is it that well established?

I’ve asked some people about this lately, and most of the answers invariably beings “but obviously it’s harder to maintain a dynamic system”. Things that are “obvious” like that really worries me.

Now, Java systems can be hard to maintain. We know that. There are lots of documentation and talk about hard to maintain systems with millions of lines of code. But I really can’t come up with anything I’ve read about people in dynamic languages talking about what a maintenance nightmare their projects are. I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages). And they are not talking about how they wish they had static typing. Not at all. Of course, this is totally anecdotal, and maybe these guys are above your average developer. But in that case, shouldn’t we hear these rumblings from all those Java developers who switched to Ruby? I haven’t heard anyone say they wish they had static typing in Ruby. And not all of those who migrated could have been better than average.

So where does that leave us? With a big “I don’t know”. Thinking about this issue some more, I came up with two examples where I’ve heard about someone leaving a dynamic language because of issues like this. And I’m not sure how closely tied they are to maintenance problem, not really, but these were the only ones I came up with. Reddit and CDBaby. Reddit switched from Lisp to Python, and CDBaby switched from Ruby to PHP. Funny, they switched away from a dynamic language – but not to a static language. Instead they switched to another dynamic language, so the problem was probably not something static typing would have solved (at least not in the eyes of the teams responsible for these switches, at least).

I’m not saying I know this is true, because I have no real, hard evidence one way or another, but to me the “obvious” claim that dynamic languages are harder to maintain smells a bit fishy. I’m going to work under the hypothesis that this claim is mostly myth. And if it’s not a myth, it’s still a red herring – it takes the focus away from more important concerns with regard to the difference between static and dynamic typing.

I did a quick round of shouted questions to some of my colleagues at ThoughtWorks I know and respect – and who was online on IM at the mime. The general message was that it depends on the team. The people writing the code, and how they are writing it, is much more important than static or dynamic typing. If you make the assumption that the team is good and the code is treated well from day 0, static or dynamic typing doesn’t make difference for maintainability.

Rebecca Parsons, our CTO said this:

I think right now the tooling is still better in static languages. I think the code is shorter generally speaking in dynamic languages which makes it easier to support.

I think maintenance is improved when the cognitive distance between the language and the app is reduced, which is often easier in dynamic languages.

In the end, I’m just worried that everyone seems to take the maintainability story as fact. Has there been any research done in this area? Smalltalk and Lisp has been around forever, there should be something out there about how good or bad maintenance of these systems have been. There are three reasons I haven’t seen it:

  • It’s out there, but I haven’t looked in the right places.
  • There are maintenance problems in all of these languages, but people using dynamic languages aren’t as keen on whining as Java developers.
  • There are no real maintenance problems with dynamic languages.

There is a distinct possibility I’ll get lots of anecdotal evidence in the comments on this post. I would definitely prefer fact, if there is any to get.



Language revolution


JAOO was interesting this year. A collection of very diverse subjects, and many focusing on programming languages – we had presentations about functional programming, JavaScript, Fortress and JRuby. Guy Steele and Richard Gabriel did their 50 in 50 presentation, which was amazing. I’ve also managed to get quite a lot of work done on Ioke. The result of all this is that my head has been swimming with thoughts about programming languages. I’ve also had the good fortune of spending time talking about languages with such people as Bill Venners, Lars Bak, Neal Ford, Martin Fowler, Guy Steele, Richard Gabriel, Dave Thomas, Erik Meijer, Jim des Rivieres, Josh Holmes and many others.

It is obvious that we live in interesting times for programming languages. But are they interesting enough? What are the current trends in cutting edge programming languages? I can see these:

  • Better implementation techniques. V8 is an example of this, and so is Hotspot. V8 employs new techniques to drive innovation further, while Hotspot’s engineers continuously adds both old and new techniques to their tool box.
  • DSLs. The focus by some people on domain specific languages seem to be part of the larger focus on languages as an important tool.
  • Functional semantics. Erik Meijers keynote was the largest push in this direction, although many languages keep adding features that make it easier to work in a functional style. Clojure is one of the new languages that come from this point, and so is Scala. The focus on concurrency generally lead people to the conclusion that a more functional style is necessary. From the concurrency aspect we get the recent focus on Erlang. Fortress aslo seems to be mostly in this category.
  • Static typing. Scala and Haskell are probably the most representative of this approach, in trying to stretch static typing as far as possible to improve both the programmer experience, semantics and performance.

Is this really it? You can quibble about the specific categories and where the borders are. I’m not entirely satisfied with where I put Fortress, for example, but all in all it feels like this is what’s going on.

Seeing 50 in 50 reminded me about how many languages we have seen, and how different these all are. It feels like most of the innovation happened in the past. So why is the current state of programming languages so poor? Is it because other things overshadow the language itself? I really don’t believe that. I think a good enough language would enable better tools, more productivity and more successful projects. So why isn’t it happening? We seem to be stuck in a rut. Anders Hejlsberg said in his opening keynote that the last 10-15 years have been an anomaly. I really do hope so.

What is apparent from the list compiled above is that everything that currently happens is very much evolutionary in approach. Innovation is happening, but it’s mostly small innovation.

We need a language revolution. We need totally new ways at looking at programming languages. We need new innovation, unfettered by the failures and successes of times past. We need more language implementors. We need more people thinking about these things.

I don’t know what the new approaches need to be, but the way I see it the last 10 years have been quite disappointing. If programming languages really are important tools, why haven’t we seen the same kind of innovation in that field as we have in IDEs and tools? Why haven’t we seen totally new ideas crop up? Is it because language development is always evolutionary? Does it have to be? Or is everyone interested in the field already convinced that we are at the peak right now? Or that Lisp or Smalltalk was the peak?

What needs to be rethought? I’ve read Jonathan Edwards recently, and he writes a lot about revisiting basic ideas and conclusions. I don’t agree with everything he says, but in this matter he’s totally right. We need to revisit all assumptions. We need to figure out better ways of doing things. Programming languages are just too important. We shouldn’t be satisfied with the current approaches just because we don’t know anything better.

We need a revolution.



Clojure


I know I’ve mentioned Clojure now and again in this blog, but I haven’t actually talked that much about it. I feel it’s time to change that right now – Clojure is in the air and it’s looking really interesting. More and more people are talking about it, and after the great presentation Rich gave at the JVM language summit I feel that there might be some more converts in the world.

So what is it? Well, a new Lisp dialect for the JVM. It was originally targeting both the JVM and .NET but Rich ended up not going through with that (a decision I can understand after seeing the efforts Fan have to expend to continue providing this feature).

It’s specifically not an implementation of either Common Lisp nor Scheme, but instead a totally new language that’s got some interesting features. The most striking feature of it is the way it embraces functional programming. In comparison to Common Lisp who I characterize as being a multiparadigm language, Clojure has a heavy bent towards functional programming. This includes a focus on immutable data structures and support for good concurrency models. He’s even got an implementation of STM in there, which is really cool.

So what do I think about it? First of all, it’s definitely a very interesting language. It’s also taken the ideas of Lisp and twisting them a bit, adding some new ideas and refining some old ones. If I wanted to do concurrency programming for the JVM I would probably lean more towards Clojure than Scala, for example.

All that said, I am in two minds about the language. It is definitely extremely cool and it looks very useful. The libraries specifically have lots to say for them. But the other side of it for me is from the point of Lisp purity. One of the things I really like about Lisps is that they are very simple. The syntax is extremely small and in most cases everything will just be either lists or atoms and nothing else. Common Lisp can handle other syntax with reader macros – which end up with results that are still only lists and atoms. This is extremely powerful. Clojure has this to a degree, but adds several basic composite data structures that are not lists, such as sets, arrays and maps. From a pragmatic standpoint I can understand that, but the fact that they are basic syntax instead of reader macros mean that if I want to process Clojure code I will end up having to work with several kinds of composite data structures instead of just one.

This might seem like a small thing, and it’s definitely not something that would stop me from using the language. But the Lisp lover in me cringes a bit at this decision.

All in all Clojure is really cool and I recommend people to take a look at it. It’s getting lots of attention and people are writing about it. Stu Halloway is currently in the process of porting Practical Common Lisp to Clojure, and I recently saw a blog post about someone porting On Lisp to Clojure, so there is absolutely an interest in it. The question is how this will continue. As I’ve started saying more and more: these are interesting times for language geeks.



JVM Language Summit – last day


The final day of the language summit sported loads of interesting presentations, just like the first two days. I can’t over stress how well prepared these three days have been – especially with regards to the schedule. A huge thanks to Brian Goetz, John Rose and Charles Nutter, for being the main instigators and coordinators of this effort. Very nice work indeed.

The third day also marked a departure from being mostly JVM centered. The first presentation was Mads Torgersen from Microsoft, talking about LINQ. I hadn’t actually realized what a thin layer over regular closure syntax the LINQ infrastructure was. Quite impressive, although it still feels like one of those ideas that really makes sense in some cases, but not at all as many as was originally envisioned.

Per Bothner described his Kawa toolkits. If you didn’t know, Kawa is one of the older alternative language implementations for the JVM – but it’s not only an implementation of the Scheme language. It also contains parts of an Emacs Lisp, XQuery and Common Lisp implementations. As it turns out, Kawa has ended up being more of a general toolkit for building dynamic languages on top of the JVM. Very nice, and I’m seriously considering stealing parts of it for a few of my language projects. It was also a quite astute presentation with no unnecessary frills. Per ended his presentation with some thoughts about language design – and why so many languages seem to be dynamic just for the sake of being dynamic. I totally agree with his sentiment, even though ioke will be about as dynamic a language I can imagine.

Erik Meijer talked about Fundamentalist Functional Programming. Erik is always a hilarious speaker, and even if you don’t agree with everything he says, it’s great entertainment and loads of great quotes and soundbites. Eriks main point of view is that what is currently called functional programming is really nothing of the sort. The only way you can be really pure in a functional programming language is by specifying which parts of the implementation that has any side effects. If these are specified the side effects can be isolated and thus you can keep the rest of your system free from the taint of side effects. He showed several compelling examples of how side effects totally mess up calculations that look like the should have been correct and functional in approach. One thing he said that I really liked was when he talked about “types that lie”, where you have situations in many common languages that have static types but the types doesn’t actually tell you the whole truth. In that case Erik feels it’s better to just be dynamic, since dynamic languages at least doesn’t have dishonest types. Of course, the thrust of the presentation was Haskell, and how Erik is trying to sneak the benefits of Haskell into Visual Basic. He’s got his own group for fooling around with things like this, and it does sound extremely interesting. Oh, and he’s looking for people for his team. In another life, if I’d been more convinced about static typing, I might have applied.

I ended up spending most of the lunch/open space time working on my Antlr-ELisp system. It’s actually coming along really great. Simple lexing works and I have most of the DFA handling implemented too. It ended up being a really fun project although lack of lexical closure bites me over and over again. I’m just too used to it from Scheme and Common Lisp, and not having it makes my brain hurt sometimes.

Part of the open space time was also devoted to a quick introduction by Cliff Click to his Azul systems. Every time I see the kind of analysis he can do with that VM I’m astonished. It’s just so cool stuff. Everyone should have an account there. Really.

After lunch, Rene Jansen from IBM talked about NetRexx – one of those JVM languages that aren’t as well known, but lives a really successful life inside of company boundaries. I really wonder have many of those languages there really are.

Paul Phillips gave a presentation about Scalify – one of those funny, half way crazy projects. It aims to translate Java code to Scala, to make adoption for Scala easier. At first glance this sounds quite easy, but there are several complications to the approach, and Paul introduced it all in a candid and interesting way.

After that it was Neal Gafter’s turn (who is nowadays at Microsoft. Will the world ever start making sense? =). From the title of the presentation I got the impression it would mostly be about closure proposals, but instead he talked a lot about the impedance mismatch between different languages on the JVM, how you can handle that and what needs to be done to the core language to make it possible to interoparete better between languages. Closures is one of the things that Java really doesn’t have, and it’s very obvious from the design of the standard libraries. Very good and thoughtful presentation.

Cliff Click did one of his typical romps about the JVM and how well some of the alternative language translate to bytecode. Extremely entertaining and full of small factoids about the JVM (like the usefulness/lack of usefulness of the Integer.valueOf cache in some cases).

After that, I did a short presentation about Jatha, and tried to mention some generalities about the languages that haven’t succeeded so far, and what kind of support they might want to have from the JVM in the future.

The last talk of the day was about the Parrot VM. In this case there were a bit too much introductory material about dynamic languages, and not enough meat about how the Parrot VM actually works. I would have loved to get much more deep content here than I actually did.

All in all the Friday was an extremely strong day with regards to presentations. Loads of technical content but also some more high level musings that contained real insight about the current state of programming language implementations. I’m very happy with the whole JVM language summit, and it seems like it will happen again next year. If you have any interest in this area I recommend that you set aside the time to go. It will be worth your while.

And now I’m off to JAOO, which also seems to have lots of delicious content for a language geek like myself. We do live in very interesting times.



JVM Language Summit – Second day


I’m sitting here during the third day of the JVM language summit, and thought I’d summarize the second day a bit. Hopefully I’ll soon be able to write about this day too as soon as it’s over.

The second day started out with Gosling talking about some of his history and how that influenced the design and implementation of Java. Not extremely interesting, but a few funny soundbites. Best was probably the quote from Guy Steele: “Lisp is like a black hole”, meaning that if you design a language close enough to Lisp, Lisp ends up dragging it in and the language becomes Lisp.

After that Tom Ball talked about JavaFX script and javac. This was quite interesting, the challenges of compiling something using the javac compiler does seem to be fraught with problems.

Charles Nutter made a good talk about the internals and interesting parts of the implementation of JRuby.

After that it was lunch and some open spaces talk about language interoperability. This is really a large problem and it was obvious that we don’t really know how to do this well between different languages on the JVM. The one solution is always go through the Java types, but this has the problems that the Java types are quite poor in comparison to some of the other languages.

Eugene Kuleshov gave an introduction to the ASM bytecode generation framework, which was very helpful – as it turned out at least half the people in the room were already using it.

Rob Nicholson gave an intro to IBM’s work on PHP for the JVM which seems to have many of the same problem as we have faced with JRuby.

Attila talked about his MOP and the directions it will take in the future, and this looks really nice. I’m looking forward to having some time to play with it.

Rémi Forax talked about his backport of JSR-292. I’m totally in awe about this. I wouldn’t even know where to begin to implement it. Very cool.

Rich Hickey did a very inspired talk about Clojure. Definitely one of the best talks – it includes loads of information, introduced the language in a good way and was generally very cool. I do have some opinions about the language itself, but I’ll save that for another blog post.

The last two talks of the day was about Python. The first one was about gradual typing for Python – this is interesting work and long term it would be interesting to see how it turns out. The Jython internals talk by Frank was also very nice and gave at least me some new insight into how their implementation actually works.

All in all a very interesting day.



New projects


OK, I’ve hinted and talked about in several contexts that I have some projects going on that haven’t been announced.  So today I’m going to just mention two of them (so that they will be released at some point). (And no, ioke won’t be on this list).

Xample

This one I have real hopes for. It’s already functioning to a degree. So what does it do? Well, I call it example-driven DSLs. Instead of writing a parser/regexps for handling external DSLs, Xample will take some examples and then derive code to do something based on those examples. It’s one of those things that will not do everything, but it might solve 80% of all the simple DSLs. Those cases where the overhead of creating the DSL framework would not be worth it. Xample will help in those cases in making it really easy to create quite sophisticated DSLs. It’s in Ruby, and I have plans that include a generic work bench for handling these DSLs. Since you have examples of how they should look, you can use the information in all kinds of cool ways.

I have just made the Xample repository public on github.

Antlr-ELisp

This is probably one of the worst cases of yak shaving I’ve ever started on. But it’s a good and worthwhile project. And now that I’ve mentioned it to Terence I guess there’s no way of getting away from it… =)

So, simple, Antlr-ELisp is a backend for Antlr, that allows you to create parsers for Emacs Lisp. This should make it quite easy to get Emacs modes written for a language, without having to resort to all the awful hacks Emacs generally does to handle different languages. This one is really in the infant stage right now, but it looks like it shouldn’t be too hard. Of course I’m not sure what performance will be like, but I’m trying to use as much macros as possible so the low level operations can still be efficient without looking ugly.

This project is also on github right now.

So, now I’m stuck – talking about these things on my blog means I will actually have to get both of them to a release, really soon now!



JVM Language Summit – first day


Just came back from the first day of the JVM language summit, and it’s been a very interesting day indeed. I made some bad morning choices – and spending some time fighting Notes – so I ended up arriving ten minutes into the first presentation.

The JVM language summit is a three day event organized by Sun, and the collection of people in the room is quite impressive. There are about 80 people all in all, and several huge names among them. Very fun.

So, the first talk was a quick intro to the Hotspot engine, what kind of features it sport and what we can expect from it in the future. (They’re adding a new GC algorithm, among other things).

After that John Rose talked about the DaVinci machine, and what specifically is part of the JSR292 work (invokedynamic and method handles mostly), but he also talked about other language features that might be nice to have, such as continuations, tail calls, value types and other things. During this talk Mark Reinhold said that invoke dynamic will be a part of Java 7, as I posted earlier.

Bernd Mathiske talked about the Maxine VM, which was quite interesting although I’ve seen more or less the same talk before.

After that there was time for lunch and open spaces discussions. I ended up in the same room as Terence Parr and some other people talking about Antlr. I made the bad decision to quickly tell them about a project I’m working on, and as a result I now have to actually finish it and publish it. Why can’t I just shut up? (Announcement will be posted shortly)

We got a quick intro to the Fan language, talking about some of the issues involved in supporting both the JVM and .NET from the same language. One of the large implications is that Java interop won’t really happen in such a language. Everything you use need to be implementation in the Fan standard library – at least that’s the impression I got.

Scott Davies did a classic introduction to Groovy. It was mostly geared towards Java developers and as such maybe weren’t a perfect match for the audience. He did make some good points from a perspective language designers/implementors don’t generally spend much time on.

Finally, Iulian Dragos talked about some of the ways Scala is optimized, how closures are compiled and what kind of compiler optimizations is done. This was really interesting, although I didn’t get the chance to ask about structural types.

The talk about Fortress was really interesting. If I was in the target audience I would be totally drooling, and as a language implementor it sure seems cool too. Implicit parallelism is hard to get right, but it sure seems like Fortress does it.

During the JVM multiple dispatch talk I sadly zoned out and worked on the project I’d mentioned to Terence. It seemed to be quite interesting, although I’m quite skeptical about the benefits of multiple dispatch in a language like Java. It doesn’t feel like methods should belong to classes in such a system.

Finally, Stuart Halloway hold a lightning talk about how different features of a language contribute to making it easy working in an agile way. Of course, calling it a lightning talk was a bit funny, since it ran to 30-35 minutes…

Looking forward to several session tomorrow. Goslings keynote might be interesting, Attila’s talk will be fun, and the talk about gradual typing in Python looks cool too.



Invoke dynamic in JDK 7


First post from the JVM Language Summit. Mark Reinhold just stated that invokedynamic definitely will be in Java 7. This is obviously great news for anyone who cares about dynamic languages.



ObjectSpace: to have or not to have


Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and it’s not even possible for all edge cases. But that argument have been made several times by both me and Charles.

ObjectSpace now, that is another story. The problems with OS are many. But first, let’s take a quick look at the most common usage of OS; iterating over classes:

ObjectSpace::each_object(Class) do |c|
  p c if c < Test::Unit::TestCase
end

This code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.

Before we take a closer look at this example, let’s talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It’s actually very simple, and there are no performance problems of having it when it’s not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).

So, the other side of the story: how does JRuby implement it? Well, JRuby can’t inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.

So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I’ve seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn’t that a quite gross implementation? Shouldn’t it be possible to do something better? Euhm, yes:

module SubclassTracking
  def self.extended(klazz)
    (class <<klazz; self; end).send :attr_accessor,
                                     :subclasses
    (class <<klazz; self; end).send :define_method,
                                 :inherited do |clzz|
      klazz.subclasses << clzz
      super
    end
    klazz.subclasses = []  end
end

# Where Test::Unit::TestCase is defined
Test::Unit::TestCase.extend SubclassTracking

# Load all other classes# To find all subclasses and test them:
Test::Unit::TestCase.subclasses

I would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What’s the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.

As a sidenote: I’m putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.

Anyone have any real world examples of things you need to do with ObjectSpace?