Rubinius is important


I predict that parts of this blog posts will make certain people uncomfortable, annoyed and possibly foamy mouthed. If you feel that you’re of this disposition, please don’t read any further.

As I’m working on JRuby, I obviously think that JRuby is the best solution for many problems I perceive in the MRI implementation currently. I have been quite careful to never say anything along the lines that JRuby is better than anything else, though. I will continue with that stance. However, I won’t restrict myself in the same way regarding Rubinius.

In fact, I’m getting more and more convinced that for the people that don’t need the things Java infrastructure can give you, Rubinius is the most important project around, in Ruby-land. More than that, Rubinius is MRI done right. If nothing substantial changes in the current timeline and plans for Ruby 1.9.1, I predict that Rubinius will be the CRuby implementation of choice within 6 months. Rubinius is an implementation done the way MRI should have been. Of course, Matz have always focused on the language, not the implementation. I’m very happy about that, since it means that we have an outstanding language.

But still. Rubinius will win over MRI and YARV. I’ve had this thought for a while, and I’m finally more or less convinced that it’s true. Of course, there are a few preconditions. The first and most important one is that Rubinius delivers 1.0 as planned, by the end of the year and that it doesn’t have abysmal performance. Or if YARV would happen to be totally finished and perfectly usable in the same time frame, things might take a different turn.

Why is Rubinius so good, compared to the existing C implementations? There are a number of good reasons for this:

  • It is byte code based. This means it’s easier to handle performance.
  • It has a pluggable, very clean architecture, meaning that for example garbage collection/object memory can be switched out to use another algorithm.
  • It is designed to be thread safe (though this is not really true yet), and Multi-VM capable.
  • It works with existing MRI extensions.
  • Most of the code is written in Ruby.
  • It gives you access to all the innards, directly from your Ruby code (stuff like MethodContexts/BlockContexts, etc).
  • The project uses Valgrind to ensure that the C code written is bullet proof.

Anyway. I put my money on Rubinius. Of course, that doesn’t mean I don’t think JRuby have a place to fill in the eco system. In fact, the real interesting question is what will happen when both Rubinius and JRuby have become more mature. I’d personally love to see more cooperation and sharing between the projects. Not a merging, since the goals are too separate, but it would be wonderful if JRuby could use the same Ruby code for all the primitive operations as Rubinius does.

Right now we have a simple Rubinius engine in JRuby, that can interpret and run some simpler byte codes from Rubinius.

JRuby and Rubinius are both extremely important. Right now I believe JRuby is more important, since it opens up a totally different market for Ruby, and gives the benefits of Java to Ruby. Rubinius has another place to fill.

Of course, being who I am, I have also looked into what would be required to port Rubinius to Java, using the same approach directly instead of going through JRuby. If you decide to use Java’s Garbage Collector, Java Threads, and reuse the JRuby parser you would end up with about 40 files of C code to port. Most of these are extremely easy, and none is really that hard. And what you would end up with is something that would run the same things Rubinius does, but with the possibility of invoking Java code at the same time. (Of course, I hope that Evan reserves a block of about 8-16 bytecodes that can be implementation dependent – these Jubinius would use to interop with Java code).



Closing over ZSuper


One of the features of Ruby which I sometimes like and sometimes hate, is ZSuper. (So called, because it differs from regular super in the AST.) ZSuper is the keyword super, with arguments and parenthesis, which will call the super method with the same arguments as the current invocation got. Of course, that’s not all. For example, if you change the arguments, the changes will propagate to the super implementation. Not only if you change the object, but if you change the reference, which I found non intuitive the first time I found it.

That’s all and well. The interesting thing happens when you close over the super call and return it as a Proc. I haven’t seen anyone doing this, which I guess is why there seems to be a bug in the implementation. Look at this code and tell me what it prints:

class Base
def foo(*args)
p [:Base, :foo, *args]
end
end

class Sub < Base
def foo(first, *args)
super
first = "changed"
super
proc { |*args| super }
end
end

Sub.new.foo("initial", "try", :four).call("args","to","block")

Notice that Base#foo will get called three times during this code. In Sub#foo we are changing the first argument to the new string “changed”. As I told you before, the second super call will actually get “changed” as the first argument the second time. But what will happen after that? We first create a block that uses ZSuper. We send the block to proc, reifying the block into an instance of Proc, and returning that. Directly after returning the block, we call it with some arguments. Now, the way I expect this to work (and incidentally, that’s the way JRuby works) is that the output should be something like this:

[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", "try", :four]

We see that the first argument changed from “initial” to “changed”, but otherwise the result is the same; the closure is a real closure over everything in the frame and scope. I guess you’ve realized that the same isn’t true for Ruby. Without further ado, this is the output from MRI 1.8.6:

[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", ["args", "to", "block"], false]

The first time I saw this, the words WTF passed through my mind. In fact, that still happens sometimes. What is happening here? Well, obviously, it seems as if the passing of arguments to the block somehow clobbers the part where MRI saves away the closure over passed arguments. I have no idea whatsoever what the false value comes from. Hmm. But now that I think about it (this is just a guess), but I believe it stands for the fact that the arguments should be splatted into one argument. (That’s the one called args in the block). If it had been true, they should refer to different variables. I think there is some trickery like that involved in the splatting logic in MRI.

Anyway. Is this a bug or a feature? I can’t see any way it could be used in an obvious way, and it runs counter to being understandable and unsurprising. Anyone who can give me a good example of where this is useful behavior?