July 16th, 2007
JRuby Inside
Peter Cooper have opened a “sister” site to RubyInside, called JRubyInside. It seems very promising; the address is http://www.jrubyinside.com.
Peter Cooper have opened a “sister” site to RubyInside, called JRubyInside. It seems very promising; the address is http://www.jrubyinside.com.
One of the features of Ruby which I sometimes like and sometimes hate, is ZSuper. (So called, because it differs from regular super in the AST.) ZSuper is the keyword super, with arguments and parenthesis, which will call the super method with the same arguments as the current invocation got. Of course, that’s not all. For example, if you change the arguments, the changes will propagate to the super implementation. Not only if you change the object, but if you change the reference, which I found non intuitive the first time I found it.
That’s all and well. The interesting thing happens when you close over the super call and return it as a Proc. I haven’t seen anyone doing this, which I guess is why there seems to be a bug in the implementation. Look at this code and tell me what it prints:
class Base
def foo(*args)
p [:Base, :foo, *args]
end
end
class Sub < Base
def foo(first, *args)
super
first = "changed"
super
proc { |*args| super }
end
end
Sub.new.foo("initial", "try", :four).call("args","to","block")
Notice that Base#foo will get called three times during this code. In Sub#foo we are changing the first argument to the new string “changed”. As I told you before, the second super call will actually get “changed” as the first argument the second time. But what will happen after that? We first create a block that uses ZSuper. We send the block to proc, reifying the block into an instance of Proc, and returning that. Directly after returning the block, we call it with some arguments. Now, the way I expect this to work (and incidentally, that’s the way JRuby works) is that the output should be something like this:
[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", "try", :four]
We see that the first argument changed from “initial” to “changed”, but otherwise the result is the same; the closure is a real closure over everything in the frame and scope. I guess you’ve realized that the same isn’t true for Ruby. Without further ado, this is the output from MRI 1.8.6:
[:Base, :foo, "initial", "try", :four]
[:Base, :foo, "changed", "try", :four]
[:Base, :foo, "changed", ["args", "to", "block"], false]
The first time I saw this, the words WTF passed through my mind. In fact, that still happens sometimes. What is happening here? Well, obviously, it seems as if the passing of arguments to the block somehow clobbers the part where MRI saves away the closure over passed arguments. I have no idea whatsoever what the false value comes from. Hmm. But now that I think about it (this is just a guess), but I believe it stands for the fact that the arguments should be splatted into one argument. (That’s the one called args in the block). If it had been true, they should refer to different variables. I think there is some trickery like that involved in the splatting logic in MRI.
Anyway. Is this a bug or a feature? I can’t see any way it could be used in an obvious way, and it runs counter to being understandable and unsurprising. Anyone who can give me a good example of where this is useful behavior?
When I get bored with JRuby, I tend to go looking either at other languages or other language implementations. This happened a few days ago, and the result is what I will here document. Begin by creating a file called fib.rb:
def fib(n)
if n < 2
n
else
fib(n - 2) + fib(n - 1)
end
end
p fib(15)
The next part requires that you have a recent version of Rubinius installed:
rbx compile fib.rb
This will generate fib.rbc. Next, take a recent JRuby version and run:
jruby -R fib.rbc
And presto, you should see 610 printed quite soon. This is JRuby executing Rubinius bytecode. I was quite happy about how it was to get this far with the functionality. Of course, JRuby doesn’t support most bytecodes yet, only those needed to execute this small example, and similar things. We are also using JRuby’s internals for this, which means that Rubinius MethodContext and such are not available.
Another interesting note is that running the iterative Fib algorithm like this with -J-server is actually 30% faster than MRI.
This approach is fun, and I have some other similar ideas I really want to look at. The best part about it though, is that I got the chance to look at the internals of Rubinius. I hope to have more time for it eventually. Another thing I really want to do some day is implement a Jubinius, which should be a full port of the Rubinius runtime, possibly excluding Subtend. I think it could be very nice to have the Smalltalk core of Rubinius working together with Java. Of course, I don’t have any time for that, so we’ll see what happens in a year or two. =) Maybe someone else does it.
After my last post I got several comments about evil.rb. Of course I had evil.rb in mind when doing some of it, but I also forgot to describe the two most evil methods of the JRuby module: runtime and reference. The runtime method will return the currently executing JRuby runtime as a Java Integration, meaning you can get access to almost anything you want with it. For example, if you want to take a look at the global CacheMap (used to cache method instances):
require 'jruby'
JRuby::runtime.cache_map
Whoops. And that’s just the beginning. Are you interested in investigating the current call frame or activation frame (DynamicScope in JRuby):
require 'jruby'
p JRuby::runtime.current_context.current_frame
a = 1
p JRuby::runtime.current_context.current_scope
Of course, you can call all accessible (and some inaccessible) methods on these objects, just like if you were working with it from Java. Use the API’s and take a look. You can change things without problem.
And that also brings us to one of the easiest examples of evil.rb, changing the frozen flag on a Ruby object. Well, with the reference method, that’s easy:
require 'jruby'
str = "foo"
str.freeze
puts str.frozen?
JRuby::reference(str).setFrozen(false)
puts str.frozen?
JRuby::reference will return the same object sent in, wrapped in a Java Integration layer, meaning that you can inspect and modify it to your hearts like. In this way, you can get at the internals of JRuby in the same way you can using evil.rb for MRI. And I guess these features should mainly be used for looking and learning about the internals of JRuby.
So, have fun and don’t be evil (overtly).
I have spent a few hours adding some useful features these last days. Nothing extraordinary, but things that might come in handy at one point or another. The problem with these features is that they are totally JRuby specific. That means you could probably implement them for MRI, but noone has done it. That means that if you want to use it, beware. Further, they exploit a few tricks in the JRuby implementation, meaning it can’t be implemented in pure Ruby.
So, that was the disclaimer; now onto the fun stuff!
Breaking encapsulation (even more)
As you know, in Ruby everything is accessible in some form or another, and you can do almost everything with the metaprogramming facilities. Well, except for one small detail which I found out while working on the AR-JDBC database drivers.
We have some code there which needs to be separate for each database, and it just so happens that core ActiveRecord have already implemented them in a very good way. So, what do we do? Mix in them and remove the methods we don’t want? No, because ActiveRecord adapters are classes, not modules, and you can’t mix in classes. There is no way to get hold of a method and add that to an unrelated other class or module. Except if you’re on JRuby, of course:
require 'jruby/ext'
class A
def foo
puts "A#foo"
end
def bar
puts "A#bar"
end
end
class B;end
class C;end
b = B.new
b.steal_method A, :foo
b.foo
B.new.foo rescue nil #will raise NoMethodError
C.steal_methods A, :foo, :bar
C.new.foo
C.new.bar
Of course, using this should be avoided at all costs. But it’s interesting that such a powerful thing can be implemented using about 15 lines of Java code.
Introspection
JRuby parses Ruby code into an Abstract Syntax Tree. For a while now, the JRuby module have allowed you to parse a string and get the AST representation by executing:
require 'jruby'
JRuby.parse "puts 'hello'", 'filename.rb', false
This returns the Java AST representation directly, using the Java Integration features. That is old. What is new is that I have added pretty inspecting, a nice YAML format and some navigation features which makes it very easy to see exactly how the AST looks. Just do an inspect or to_yaml on an AST node and you will get the relevant information.
That is interesting. But what is even more nice is the ability to run and use arbitrary pieces of the AST (as long as they make sense together) and also run them:
require 'jruby'
ast_one = JRuby::ast_for("n = 1; n*(n+3)*(n+2)")
ast_two = JRuby::ast_for("n = 42; n*(n+1)*(n+2)")
p (ast_one.first.first + ast_two.first[1]).run
p (ast_two.first.first + ast_one.first[1]).run
As you can see, I take two fragments from different code, add them together and run them. You can also see that I’m using an alias for parse here, called ast_for. That makes much more sense when using the second parse feature, which we already know from ParseTree:
require 'jruby'
JRuby::ast_for do
puts "Hello"
end
Well, I guess that’s all I wanted to show right now. These last small things I’ve added because I believe they will be highly useful for debugging JRuby code.
I also have some more ideas that I want to implement. I’ll keep you posted about it.
Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and it’s not even possible for all edge cases. But that argument have been made several times by both me and Charles.
ObjectSpace now, that is another story. The problems with OS are many. But first, let’s take a quick look at the most common usage of OS; iterating over classes:
ObjectSpace::each_object(Class) do |c| p c if c < Test::Unit::TestCase end
This code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.
Before we take a closer look at this example, let’s talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It’s actually very simple, and there are no performance problems of having it when it’s not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).
So, the other side of the story: how does JRuby implement it? Well, JRuby can’t inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.
So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I’ve seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn’t that a quite gross implementation? Shouldn’t it be possible to do something better? Euhm, yes:
module SubclassTracking def self.extended(klazz) (class <<klazz; self; end).send :attr_accessor, :subclasses (class <<klazz; self; end).send :define_method, :inherited do |clzz| klazz.subclasses << clzz super end klazz.subclasses = [] end end # Where Test::Unit::TestCase is defined Test::Unit::TestCase.extend SubclassTracking # Load all other classes# To find all subclasses and test them: Test::Unit::TestCase.subclasses
I would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What’s the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.
As a sidenote: I’m putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.
Anyone have any real world examples of things you need to do with ObjectSpace?
It has been two long days; not because I’ve been going to sessions all day long, but because I’ve reworked my presentations quite heavily. But now both the BOF and the TS are finished, and I think they went well. I had to keep the level to Ruby, JRuby and Rails introductionary material, though, since most developers here didn’t seem to know what is possible with these technologies.
But it’s been great; I’ve gotten good feedback and had some really interesting conversations with lots of people.
We have been doing the town each night, and I’ve found that I like Barcelona very much. Except for the food: this country doesn’t seem to be good for vegetarians at all. Very annoying. I’m going for beer and wine instead of food the rest of the week. =)
One day left, though, and it’s bound to be nice. Me and Martin are both on a developers panel about the state of programming languages in 2020; I have no idea what to say, and I’m thinking about just ad-libbing it. I know my own position in these questions fairly well, and the current Yegge-debate have made my opinions even more explicit.
But now it’s time to see the town again.
Yesterday I spent 2 hours chatting with Fabio Akita, of AkitaOnRails (the largest Rails blog in Brazil); the result is a long interview that was published today. It’s got some good stuff, and some Ola-stuff, which you should recognize by now.
And I note that he calls me a workaholic; but he got this interview prepared in less than a day too, and also translated it into Portuguese.
You can find it at http://www.akitaonrails.com/pages/olabini.
I’ve finally started. I’ve finally moved to London. I’ve been working for two weeks at ThoughtWorks now, and it’s been quite crazy. Everything is very nice and I’m having loads of fun. Of course, it’s also lots of hard work, and I feel that I’m stretching my capacity considerably more than I ever did at Karolinska Institutet. That’s great, and I feel that I’m really doing something real now. We have so many interesting things going on, and I wish I could tell you all about it.
What I can tell you is that I’m working quite much on Mingle, and I’m also spending time on other JRuby related issues. I’ve been planning on getting SQL Server and Oracle working as good as possible with AR-JDBC, and I’ve spent time on Derby performance. Hopefully I’ll continue the database work this week, since especially SQL Server and Oracle is very important.
The most important work for this week is probably to prepare for TheServerSide in Barcelona. I still haven’t had time to prepare my demos, so it’s about time now. I hope to see many of you there.
In conclusion, my first weeks at ThoughtWorks have been awesome. I really like the pople, and everything is just neat. I like being able to walk to work and working in the very nice TW office on High Holborn. I’m very happy about it all.
As you know, I am writing on a book about JRuby on Rails. A few minutes ago I finished the first draft of chapter 14. That means that there are just 3 chapters and 3 appendixes left to write (chapter 1, 2 and 15). So the writing is going very well, but it’s taking a heavy toll on me personally. I seriously don’t recommend writing a book like this in your spare time, while at the same time switching employer, moving abroad and try to be a core developer in an open source project which is getting lots of attention.
So, in summary: it’s going well, it still looks like it will be out in October, and I’m deadly tired.