Ruby is hard


I have been doing Ruby programming for several years now, and been closely involved with the JRuby implementation of it for about 20 months. I think I know the language pretty well, and I think I have for a long time. This has blinded me to something which I’ve just recently started to recognize. Ruby is really hard. It’s hard in the same way LISP is hard. But there are also lots of gotchas and small things that can trip you up.

Ruby is a really nice language. It’s very easy to learn and get started in, and you will be productive extremely quickly, but to actually master the language takes much time. I think this is underestimated in many circles. Ruby uses lots of language constructs which are extremely powerful, but they require you to shift your thinking – especially when coming from Java, which many current Ruby converts do.

As a small example of a gotcha, what does this code do:

class Foo
attr_accessor :bar
def initialize(value)
bar = value
end
end

puts Foo.new(42).bar

If your answer is that it prints 42, you’re wrong. It will print nil. What’s the necessary change to fix this? Introduce self:

class Foo
attr_accessor :bar
def initialize(value)
self.bar = value
end
end

puts Foo.new(42).bar

You could argue that this behavior is annoying. That it’s bad. That it’s wrong. But that’s the way the language works, and I’ve seen this problem in many code bases – including my own – so it’s worth keeping an eye open for it.

Why does it happen? Well, the problem is this, when the Ruby parser finds a word that begins with an underscore or a lower case letter, and where that word is followed by an equals sign, there is no way for the parser to know if this is a method call (the method foo=) or a variable assignment. The fall back of the parser is to assume it is a variable assignment and let you specify explicitly when it should be a VCall. So what happens in the first example is that a new local variable called “bar” will be introduced and assigned to.



Languages of the future


Martin Fowler writes about one language, and neatly encapsulates what I think about the subject in the way I would have written it, if I were actually a good writer. Go read.



Really Radical Ruby


I had a very good time at FOSCON III with the Portland Ruby Brigade yesterday. There were lots of entertaining talks too. I would say that my “lightning talk” wasn’t really a lightning talk at all. Unless you count the speed of my talking… I managed to race through all my 22 slides – all of them shock full with information – in the time alloted to me. Hopefully people learned something from it.

Chad Wathington demonstrated Mingle which also was very nice.

John Lam showed us a taste of IronRuby, and also talked some about the implementation particulars that made certain things faster in IronRuby than on MRI. Interesting stuff, but I’m looking forward to the full talk tomorrow.

Alan McKean from GemStone showcased GemStone/JRuby – still a work in progress though. For those of you who don’t know what GemStone is, think extremely powerful object persistence. And they’re building a new version for Ruby, on top of JRuby. Very cool.

I realized that there are a few points about JRuby that haven’t been emphasized enough, though, so here is there executive summary bullet points:

  • JRuby is totally Java compliant and runs on any Java.
  • JRuby is 1.0
  • JRuby supports Rails
  • ThoughtWorks offers commercial support for JRuby
  • JRuby performance is on par with the C implementation, on average.


Will JavaScript be the next big language?


I guess most of you have read the latest happenings in the whole NBL-story. Steve Yegge have created a Rhino on Rails; a JavaScript version of Rails. I have meant to write about this for a few weeks, but I felt I wanted to think it through before writing about it.

Of course, it’s really cool that Rhino on Rails exists, and may someday be open sourced. On the other hand, the current version have no counterpart to ActiveRecord – instead it works against internal Google data systems. This may make very good sense from the pov of Google, but will make it hard to gain traction outside of Google when/if the open sourcing happens. At that point, a ActiveRecord version written using JDBC and Rails should happen. I’m personally looking forward to it, because such an undertaking would be able to take more advantage of JDBC than for example ActiveRecord-JDBC can do. ActiveRecord done “right” so to speak, but in JavaScript.

But is JavaScript the language of the future? What does it have that separates it from other current languages – most notably Ruby? Well, at first glance the big difference is in the amount of implementations. JavaScript have a multitude of implementations, while Ruby has three (Ruby, YARV? and JRuby), with many more in the works. JavaScript though, have a incredible amount of implementations, most of them bad.

It’s got a standard. It’s got full closures and a few more neat futures. And it has prototype object orientation; a very different object model compared to the standard of inheritance based object systems like Java, Ruby and Smalltalk.

On closer inspection, the prototype based object system is not really such a big deal. In fact, you already have it in Ruby. Just use Object.new and Kernel.clone. Not very practical in Ruby, but you can do it without any problem. Remember that the difference between dup and clone is that clone will actually copy the metaclass of the object, while dup just makes the duplicate use the same real class.

I see some advantages that JavaScript has over Ruby, that can actually count as real advantages. The spec is arguably one. Several good implementations is another. It’s easier to make JavaScript have good performance. And maybe most important: most developers already know it to some degree or other.

But will it be the NBL in some form or another? No, I don’t think so, for the simple reason that to many developers hate it. Not because of the language itself, but due to the connection with bad browser implementations. This is a debt that it will be very hard for JS to get away from, and I think it will tip the point against JavaScript compared any other language poised for the position of being a big language.

Further, I really don’t believe in the idea that there will be a next big language. Call me strange, but just because we’ve had this cycle two or three times doesn’t mean it will continue like that. I believe development is qualitively different now compared to 15 years ago. The challenges we meet are different and demand different tools. One single language will not cut it. I think the future lies in layered languages with different properties.



Closing over ZSuper


One of the features of Ruby which I sometimes like and sometimes hate, is ZSuper. (So called, because it differs from regular super in the AST.) ZSuper is the keyword super, with arguments and parenthesis, which will call the super method with the same arguments as the current invocation got. Of course, that’s not all. For example, if you change the arguments, the changes will propagate to the super implementation. Not only if you change the object, but if you change the reference, which I found non intuitive the first time I found it.

That’s all and well. The interesting thing happens when you close over the super call and return it as a Proc. I haven’t seen anyone doing this, which I guess is why there seems to be a bug in the implementation. Look at this code and tell me what it prints:

class Base
def foo(*args)
p [:Base, :foo, *args]
end
end

class Sub < Base
def foo(first, *args)
super
first = "changed"
super
proc { |*args| super }
end
end

Sub.new.foo("initial", "try", :four).call("args","to","block")

Notice that Base#foo will get called three times during this code. In Sub#foo we are changing the first argument to the new string “changed”. As I told you before, the second super call will actually get “changed” as the first argument the second time. But what will happen after that? We first create a block that uses ZSuper. We send the block to proc, reifying the block into an instance of Proc, and returning that. Directly after returning the block, we call it with some arguments. Now, the way I expect this to work (and incidentally, that’s the way JRuby works) is that the output should be something like this:

[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", "try", :four]

We see that the first argument changed from “initial” to “changed”, but otherwise the result is the same; the closure is a real closure over everything in the frame and scope. I guess you’ve realized that the same isn’t true for Ruby. Without further ado, this is the output from MRI 1.8.6:

[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", ["args", "to", "block"], false]

The first time I saw this, the words WTF passed through my mind. In fact, that still happens sometimes. What is happening here? Well, obviously, it seems as if the passing of arguments to the block somehow clobbers the part where MRI saves away the closure over passed arguments. I have no idea whatsoever what the false value comes from. Hmm. But now that I think about it (this is just a guess), but I believe it stands for the fact that the arguments should be splatted into one argument. (That’s the one called args in the block). If it had been true, they should refer to different variables. I think there is some trickery like that involved in the splatting logic in MRI.

Anyway. Is this a bug or a feature? I can’t see any way it could be used in an obvious way, and it runs counter to being understandable and unsurprising. Anyone who can give me a good example of where this is useful behavior?



ObjectSpace: to have or not to have


Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and it’s not even possible for all edge cases. But that argument have been made several times by both me and Charles.

ObjectSpace now, that is another story. The problems with OS are many. But first, let’s take a quick look at the most common usage of OS; iterating over classes:

ObjectSpace::each_object(Class) do |c|
  p c if c < Test::Unit::TestCase
end

This code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.

Before we take a closer look at this example, let’s talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It’s actually very simple, and there are no performance problems of having it when it’s not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).

So, the other side of the story: how does JRuby implement it? Well, JRuby can’t inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.

So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I’ve seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn’t that a quite gross implementation? Shouldn’t it be possible to do something better? Euhm, yes:

module SubclassTracking
  def self.extended(klazz)
    (class <<klazz; self; end).send :attr_accessor,
                                     :subclasses
    (class <<klazz; self; end).send :define_method,
                                 :inherited do |clzz|
      klazz.subclasses << clzz
      super
    end
    klazz.subclasses = []  end
end

# Where Test::Unit::TestCase is defined
Test::Unit::TestCase.extend SubclassTracking

# Load all other classes# To find all subclasses and test them:
Test::Unit::TestCase.subclasses

I would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What’s the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.

As a sidenote: I’m putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.

Anyone have any real world examples of things you need to do with ObjectSpace?



What’s wrong with this code?


Today I will introduce to you a method from ActiveRecord. The method takes a parameter called type and that value can bu for example :primary_key, :string or :integer. Now, in the first line there is a call to native_database_types. Generally, that call returns a structure that looks somewhat like this:

def native_database_types #:nodoc:
{
:primary_key => "int(11) DEFAULT NULL auto_increment PRIMARY KEY",
:string => { :name => "varchar", :limit => 255 },
:text => { :name => "text" },
:integer => { :name => "int", :limit => 11 },
:float => { :name => "float" },
:decimal => { :name => "decimal" },
:datetime => { :name => "datetime" },
:timestamp => { :name => "datetime" },
:time => { :name => "time" },
:date => { :name => "date" },
:binary => { :name => "blob" },
:boolean => { :name => "tinyint", :limit => 1 }
}
end

The method itself looks like this.

def type_to_sql(type, limit = nil, precision = nil, scale = nil) #:nodoc:
native = native_database_types[type]
column_type_sql = native.is_a?(Hash) ? native[:name] : native
if type == :decimal # ignore limit, use precison and scale
precision ||= native[:precision]
scale ||= native[:scale]
if precision
if scale
column_type_sql << "(#{precision},#{scale})"
else
column_type_sql << "(#{precision})"
end
else
raise ArgumentError, "Error adding decimal column: precision cannot be empty if scale if specified" if scale
end
column_type_sql
else
limit ||= native[:limit]
column_type_sql << "(#{limit})" if limit
column_type_sql
end
end

There is something very wrong with this implementation. Of course, there could exist many errors here, but what I’m thinking about right now is a violation of the usual way methods should work. And in effect, that problem with this method have caused ActiveRecord-JDBC to implement some very inefficient code to handle this method. And it gets called a lot in ActiveRecord. I’ll get back later today with a pointer to what’s wrong here, and I will also discuss some of what I’ve done in AR-JDBC to handle this situation. I hope for many suggestions here! =)



There can be only one, a tale about Ruby, IronRuby, MS and whatnot


(Updated: added a quote from John Lam about not being able to look at the MRI source code)

After RailsConf in Portland, there has flared up a discussion centered around IronRuby and Microsoft. We discussed many of these points in depth at the conference, and I’ll elaborate some on my views on the issues in a bit.

But first I would like to talk some about the multitude of Ruby implementations springing up. I firmly believe that a language evolves in phases. The first phase, germination, is the period where a language needs one consistent implementation (or a spec). It’s during this phase when most “alpha geek adoption” happens. Many important libraries are written, but most applications are not in the main economic center. Ruby have been in this phase for a long time, but the fact that new implementations are springing up left and right is a sure sign that Ruby is entering phase 2: implementation. For adoption to happen, there need to exist several competing implementations, all of them good. This is the evolutionary stage, where it’s decided what kind of features an implementation should provide. Should we have green or native threads? Are all the features of the original implementation really that necessary? (Continuations, ObjectSpace). Is there cruft in the standard library that needs to be weeded out? (timeout.rb). All of these questions get answered when other people implement the language. The last phase, which I guess could be called adoption, is when the language have several working implementations, all good enough to deliver high end applications on, when many applications are written in the language, and there exists a plethora of libraries, systems and support for the language.

What this means is that for a language to be successful, there needs to exist competing implementations. They need to implement their features in different ways and make different choices during development. Otherwise, the language will die. (This is obviously not enough, since Smalltalk fulfilled this admirably and still never got widespread adoption.). But I still believe it’s incredibly important for a language to evolve with many implementations, which is why I find Rubinius, JRuby, YARV and IronRuby to be extremely important projects for the welfare of Ruby. I want Ruby to be successful. I want Ruby to be the next major language for several reasons. But most importantly: I want Ruby to be a better language tomorrow, than it is today. The only way that’s going to happen is by having lots of people implement the language.

So, that’s enough of the introductory flame bait. This describes one half of why IronRuby is an important project, and why we can’t let it fail. The other side of the coin is the same reason JRuby is important. .NET as a platform have some wildly useful features. There are many developers who swear by .NET for good reason. And what’s more important, there are lots of large enterprises with such a vested interest in .NET, that they will never choose anything else. Now, for the welfare of all programmers in the world, I personally believe the world would be a better place if those .NET-environments also used Ruby. So that’s the other coin of why IronRuby is important.

The most well read blog about the current Microsoft/Ruby controversy is Martin Fowlers article RubyMicrosoft. Go read it now, and then I’ll just highlight the points I find most important.

First: John Lam is committed to creating a “compliant” Ruby implementation. I have no doubts that he can do it. But there are a few problems lurking.

For example, what is a compliant Ruby implementation? Since there exists no spec, and no comprehensive test suite, the only way to measure compliance is to check how close the behavior matches MRI. But this have some problems too. How do you check behavior? Well, you need to run applications. But how do you get so far as you can run applications?
What JRuby did was that we looked at MRI source code.

John Lam can not look at MRI source code. He cannot look at JRuby source code. He cannot look at Rubinius source code. If he does, he will be terminated.

So, the next best alternative: accepting patches from the community, which can look at Ruby source? Nope, no cigar. Microsoft is not really about Open Source yet. Their license allows us to look at their source code, and also to fork it and do what we want with it. But that’s only half of what open source is about. The more important part is that you should be able to contribute back code without having to fork. You can’t do that with IronRuby, since Microsoft is too scared about being sued for copyright infringement.

There was some doubt about Lam actually being banned from looking at MRI source code. This is the first quote that said it is so. It’s from the discussion “Virtual classes and ‘real’ classes — why?” on Ruby-core, this quote posted at 29/03/07:

Is this how things are actually implemented? (BTW I'm not lazy here - we cannot look for ourselves).

I am going to make a bold statement here. Under the current circumstances, I don’t believe it’s possible for John Lam and his team to create a Ruby implementation that runs Rails within at least 18 months. And frankly, that’s not soon enough.

As I said above, I have all confidence that John can do great stuff if he has the right resources. But creating a Ruby implementation is hard enough while having all the benefits of the open source community.

The two points I want to make with this point is this: The Ruby community must damned well get serious about creating a good, complete specification and test suite. It’s time to do it right now, and we need it. It’s not a one-man job. The community needs to do it. (And yes, the two SoC projects are a very good start. But you still need to be able to run RSpec to take full advantage of them; and let’s face it, the RSpec implementation uses many nice Ruby tricks.)

The second point is simpler: Microsoft needs to completely change how they handle Open Source. Their current strategy of trying to grow it into the organization will not work (at least not good enough). They need to turn around completely, reinvent themselves and make some really bold moves to be able to meet the new world. If they can’t do this, they are as dead as Paul Graham claims.



Executing YARV (in JRuby)


I guess the title says it all. Oh, it doesn’t? You want what? A tutorial on how to play with this? Well, alright then.

What you will need to get started is first and foremost JRuby trunk. Download and build that. You will also need either a very recent version of YARV or Ruby trunk. Download either and build that too. I have the source in $YARV_SRC and $RUBY_TRUNK_SRC. I’ve also installed them with suffixes, so I have ruby-yarv and ruby-trunk globally available. That’s a quite good setup.

Before compiling some Ruby, we need to change the change the compile-script slightly. So open up the file in $YARV_SRC/tool/compile.rb and disable the peephole_optimization and inline_const_cache, since JRuby doesn’t really support these features yet. If you use Ruby trunk, you also need to do a global search-and-replace in this file, from YARVCore to VM.

Now you’re finished to compile something. The goal for this execution have been to get an iterative fibonacci sequence running, so save this code in a file called fib.rb:

def fib_iter_ruby(n)
i = 0
j = 1
cur = 1
while cur <= n
k = i
i = j
j = k + j
cur = cur + 1
end
i
end

puts fib_iter_ruby(5000)

Next, compile this file:

ruby-yarv $YARV_SRC/tool/compile.rb -o fib.rbc fib.rb

Modify as needed if you use Ruby trunk instead. Now, how to run this? With the y-flag to JRuby:

jruby -y fib.rbc

As you can see, simple files like this just work. Of course, there are much missing yet, but basic interpretation works really nice. Some more additions to the YARV machine will come quite soon, probably. But the next step will be to implement a compiler for YARV too. From that point it should be possible to execute Ruby-files in JRuby in several different ways; by regular interpretation, YARV compilation, YARV interpretation, Java bytecode AOT compiling, Java bytecode JIT compiling and also possibly Rubinius interpretation. The future is, as always, interesting.



Rubinius and Lisp


Yesterday on the #rubinius-channel, me, Evan and MenTaLguY had the beginnings of a discussion about a Lisp dialect for the Rubinius VM. Since this sound exactly like the kind of thing I’ve been thinking about for some time, I got excited. But anyway, Pat Eyler covered the issue much better and did a small interview with us that can be read here. Ruby is wonderful. Rubinius is sweet. With Lisp added to the mix, it will be better.

I’ll probably follow this up with some more information and ideas, as soon it’s fleshed out.