ObjectSpace: to have or not to have


Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and it’s not even possible for all edge cases. But that argument have been made several times by both me and Charles.

ObjectSpace now, that is another story. The problems with OS are many. But first, let’s take a quick look at the most common usage of OS; iterating over classes:

ObjectSpace::each_object(Class) do |c|
  p c if c < Test::Unit::TestCase
end

This code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.

Before we take a closer look at this example, let’s talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It’s actually very simple, and there are no performance problems of having it when it’s not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).

So, the other side of the story: how does JRuby implement it? Well, JRuby can’t inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.

So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I’ve seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn’t that a quite gross implementation? Shouldn’t it be possible to do something better? Euhm, yes:

module SubclassTracking
  def self.extended(klazz)
    (class <<klazz; self; end).send :attr_accessor,
                                     :subclasses
    (class <<klazz; self; end).send :define_method,
                                 :inherited do |clzz|
      klazz.subclasses << clzz
      super
    end
    klazz.subclasses = []  end
end

# Where Test::Unit::TestCase is defined
Test::Unit::TestCase.extend SubclassTracking

# Load all other classes# To find all subclasses and test them:
Test::Unit::TestCase.subclasses

I would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What’s the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.

As a sidenote: I’m putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.

Anyone have any real world examples of things you need to do with ObjectSpace?