JRuby 0.9.8 is here


The JRuby team is pleased to announce the release of JRuby 0.9.8.

Download at: http://dist.codehaus.org/jruby/

This release has some great improvements:

  • Ruby on Rails support. We have been working hard on getting Rails own unit tests running and over 98% of them now run successfully. We feel things are running well enough to invite Ruby users to kick the tires and help root out any final issues.
  • Ruby classes can extend concrete/abstract Java classes and override methods
  • New Java primitive array syntax
  • Reimplementation of String, Numeric classes, and Array to be more correct and performant
  • Significant bottlenecks have been identified. In some cases IO is 6.5x faster than previous releases. Java included classes are significantly faster than in the past.
  • 220 Jira issues resolved since last release

Special thanks to Marcin Mielżyński for his tireless work in rewriting a number of core classes to be much for correct and quick. His attention to detail has rooted out many corner cases.

The amount of IRC conversations, mailing list threads, bug reports, patches, and blog entries in the community has been a great help and our community is really making a huge difference in how fast JRuby is improving. The amount of progress is really staggering!

If you have ever thought that JRuby wasn’t mature enough, I would like to contradict that now. With this release we are better than ever.

More information can be found at http://www.jruby.org.



The world is spinning


If you didn’t know that, the title tells it all. Of course, that’s old news.

I haven’t really been able to blog as much as I wish I could have, lately. There are reasons for this, of course. Two very exciting reasons, in fact. If everything pans out, I will be able to write about it in 2 to 3 weeks time.

In other news, we are gearing up for another JRuby release. This one will be a biggie. Many nice things will be in place, and it will set the record for both new features and bug fixes. I think no one will be disappointed by it, actually.

I have a few presentations lined up too. The closest to now will be in exactly two weeks. I will speak at the Academic Computer Science Festival in Craków, Poland. If you’re somewhere close, by all means offer to show me the city. =) I will land March 9:th and fly out again March 11:th. My presentation will be at 17:00 March 10:th, CET. If you would like more information, it can be found at the festivals homepage, here. The presentation will be in English (since I don’t speak Polish, obviously), and it will be slightly more technical than the usual JRuby presentations. There will probably be some detail about our runtime, interpreter, parser and lexer, and hopefully I’ll get some info in about our YARV and Java bytecode compiler efforts. This will be very exciting to talk about, I’m kinda salivating just thinking about it. =)

Another, more long term presentation, has just been decided. I will attend TheServerSide Java Symposium Europe, in Barcelona, from June 27:th to June 29:th, and talk about JRuby from the perspective of a Java developer, and what it can do for you. Hopefully I will have time to see the city too.

I will update with more information when possible.



Ragel performance


I did some performance testing on the old and new Resolver implementation. The testing have some stupid tests that exercise bad parts of both implementations (like longest match, where it can’t be decided what type something is until we have to backtrack about 20 characters). I placed these 24 strings in an array, and pounded on it with an instance of the ResolverImpl that is used in exactly the same way on all scalar values in an YAML document. The objective is to find out if the value is an implicit type or not. So basically, we give it a String, and get back a tag URI. So it’s not like I’m parsing a language or anything. I’m just doing some recognizing here.

The old implementation was based on a Map> where the first letter of the string to resolve was used as an index to find a list of patterns to try sequentially. This worked fine, and made it extensible. But not very fast. This is the baseline. For 24 different strings, iterated 100 000 times for 2 400 000 resolves it takes 7879ms. That’s OK, but not great.

Now, the new Ragel implementation is dead simple. It’s just a translation of the regexps in the aforementioned Pattern’s into a state machine. At EOF out actions (%/ for people in the Ragel knowhow), I execute an action that sets a local variable to a tag, and at the end of the resolve method returns that tag. Dead simple, and not exercising the full strength of Ragel, of course.
So, for the same number of resolves, this ResolverImpl takes 1288ms. That’s 611% improvement in speed. Ain’t it nice to have a friend such as Ragel? And the best part is, for harder tasks, these improvements would be even larger.
Finite State Machines are your friends. All your base are belongs to us.



Results of jvYAMLb


Well, the YAML-based loading is in JRuby trunk. On the way, some parts of the codebase got seriously simplified. Very nice. The final result, with regard to performance, is about 20-30% on speed. But the important gain is in memory usage. The new implementation takes only about one fourth of the memory the original used. So that’s great.

Regarding the Resolver, as I mentioned in the last post, it required a different approach, since regular JvYAML uses regular expressions to recognize implicit tags. Since that approach isn’t good with byte arrays, I decided to use Ragel to generate a recognizer. That approach was very successful. As soon as I got that working it was the obvious approach. Ragel is good. Ragel is great. Ragel is wonderful. I will use the same approach for regular JvYAML to get away from all those Java regexps.

So, next step will be to do the same conversion of the emitter. Of course, at that point performance isn’t that important. It’s more about memory usage and the need to get away from another external dependency in JRuby.



Faster YAML with byte processing


As noted in my last post, I have started work on converting JvYAML into JvYAMLb. Right now I have finished the work on the Scanner and the Parser, and it’s looking quite good. The numbers I reported in the last post for regular JvYAML performance was wrong though. We’re looking at about 7.8s to 10.0s for scanning that 3.5MB gemspec file. (And that’s only the scanning, not file IO). But with the Scanner converted to use bytes and ByteList, the same processing takes 2.8s. That’s a substantial difference. But it doesn’t end with that.

As I said I also converted the Parser. It doesn’t do any String processing at all, so I didn’t expect either a speedup or slowdown except for that from the Scanner. But… Before, parsing the gemspec took 18.515s, but after, it runs in 4s. That’s a dramatic speedup, and I don’t really know where it comes from. Unless the earlier implementation generated so much more garbage, and used more memory, that it was noticeable in speed. Anyway, this looks good for JRuby YAML processing, since I expect big reductions in complexity in the callpath and generation of objects after the YAML processor is byted all the way through.

But tomorrow it’s time to work on the Resolver, and that’s going to be hard. Optimally, it would be nice to have a byte-based Regexp engine. And maybe that would be something for JRuby too, know? Our Regular Expressions must be dead slow now that they have to convert to strings all the time.



Announcing JvYAMLb, a fork


The conversion to using byte-arrays as the basis of our String work in JRuby has led me to realize that JvYAML just doesn’t cut it anymore. The performance wasn’t good to begin with, and it’s even worse having to convert EVERY SINGLE STRING read into bytes. That’s no good. As an example why something needs to be done I’m going to describe the transformations that happen to data in JRuby if executing this code:

YAML.load_file "gems.yml"

First, the file is opened, and wrapped inside a RandomAccessFile. Then data is read from it by YAML. Reading will proceed like this:
1. Bytes are read through the RAF, hopefully in chunks.
2. Those bytes are wrapped in a RubyString so they can be returned from the IO#read method.
3. An IOReader wraps that RubyIO object, gets the RubyString and converts it from bytes into a String, and this String gets converted into a char array.
4. That char array is returned to the YAML Scanner.
5. The chars from the char array is collected in a StringBuffer, and saved in various Strings as token values.
6. The parser, resolver and constructor work on these Strings in various ways.
7. The JRubyConstructor takes these Strings and creates RubyString objects from them and in the process converting the String back to a byte array.

Is there any doubt that this process is slow? Well, it hasn’t been that big of a problem until now, since we are doing so well on performance in other parts of the system.

So, the radical decision is to rewrite JvYAML, making it more SYCK-compliant, working with InputStreams and byte-arrays, and in the process get away from several of the steps above. So that’s what I’m going to do. I hereby create JvYAMLb. It will only be a part of the JRuby codebase, but it will be reasonably separate, so it can be extracted for other purposes. I will not stop work on regular JvYAML, but will maintain both projects.

Since the objective of this new project is blazing speed, I will post some numbers on this now and again. But first I will show you the speed of the regular system. JvYAML’s Scanner can scan an old gem source index (about 3.5MB) of 435654 tokens in about 1654ms. This is the baseline I’m going to use to test performance, and I’ll post more on this as soon as the byte-based Scanner is ready to try out.



Bytes bites. Or maybe not.


Well, the byte arrays are in, for good and evil. We had to wrap them in a counterpart to StringBuffer, but backed by byte[] instead, since all that explicit allocation and deallocation was way unperformant.

Of course, we aren’t seeing any performance benefits from this right now. The problem is that there is still many places that use IRubyObject#toString to get at the contents. That operation is very expensive right now, so gem installs are slower, for example. But we have good hopes on improving the situation, and many parts of the codebase have become much clearer without the need to do String-to-byte[] and byte[]-to-String all over the place.



Serial JRuby


Things are really moving along faster than ever in JRuby land. It’s so fun! As my last entry told you, Hpricot is now available for JRuby (and Java) people. I need to share a few lines from the logs of yesterday evenings conversation at #jruby:


<headius> seeya ola!
* shellac does some xsl-ing, plays on the wii,
then finds ola got HPRICOT working in that time
<shellac> I'm wasting my life

Some would say that what I do with JRuby is a waste of life… Well, we’ll see about that.

Anyway, what’s happened in JRuby world since last week? First, and most important, Charles has changed our RubyString implementation. It used to be backed by either a Java String or a StringBuffer. The problem with both of these is that Ruby has a tendency to use Strings as byte buckets. And our code was riddled with encoding and decoding into and out of byte arrays. So Charles took the big step, converted RubyString to use a byte-array instead, and fixed all the bugs that he found by doing that. The result is a happier codebase, less encoding and possibly faster Zlib and IO operations. That’s big.

Tom is working on removing visibility and refactoring scopes. That could have huge impact too.

This Sunday I merged and fixed some code that allow Ruby code to inherit from Java classes and override methods there, and this overriding will be seen if an instance is sent back to Java. I’m planning on using this for some interesting tricks with Java ContentHandler’s, and this functionality is really, really, really important. But it’s also complex, since it requires generating bytecode at runtime. Fun, but hard. But now it’s in trunk, and it’s time to find the bugs in it and fix them.

I also need you to go read what Jonas Bonér has done with JRuby and OpenTerracotta. I could describe it here, but Jonas does a good job of it himself. So go there: http://jonasboner.com/2007/02/05/clustering-jruby-with-open-terracotta/. Very cool stuff, indeed!

So, the future is coming faster each day. JRuby will still conquer the world!



Hpricot goodness


This is just so cool, I cannot contain it. For those of you who haven’t heard about Hpricot, it is one of why the lucky stiff‘s incredibly cool tools (which he probably will use to take over the world any day now…). It’s HTML parsing goodness, very flexible, with the goal of being able to parse (and fix) everything that Firefox handles.

“So what?” you’re probably asking… Well, Hpricot uses Ragel and some C code to achieve blinding speed. This means JRuby can’t run it. Or I should say couldn’t run it:


orpheus:~/workspace/jruby> jruby bin/gem install hpricot --source http://code.whytheluckystiff.net
Bulk updating Gem source index for: http://code.whytheluckystiff.net
Select which gem to install for your platform (java)
1. hpricot 0.5.110 (jruby)
2. hpricot 0.5.110 (mswin32)
3. hpricot 0.5.110 (ruby)
4. hpricot 0.5 (ruby)
5. hpricot 0.5 (mswin32)
6. hpricot 0.5.0 (ruby)
7. hpricot 0.5.0 (mswin32)
8. hpricot 0.4.99 (ruby)
9. hpricot 0.4.99 (mswin32)
10. hpricot 0.4.92 (ruby)
11. hpricot 0.4.92 (mswin32)
12. Skip this gem
13. Cancel installation
> 1
Successfully installed hpricot-0.5.110-jruby
Installing ri documentation for hpricot-0.5.110-jruby...
Installing RDoc documentation for hpricot-0.5.110-jruby...

That’s right, Hpricot is now more promiscuous than any other gem with native parts.
What can you do with it? Well, I’m just going to point you to _why’s own description of it. All he says at http://code.whytheluckystiff.net/hpricot/ will work fine in JRuby!

How did this come to be? Well, me and _why did some joint hacking, which was helped along by the fact that Adrian Thurston (the genius behind Ragel) recently added Java support to it. So, basically, most of the Ragel definition is exactly the same for both the C and the Java versions. The native code has been factored out, and both versions are buildable with rake from _why’s code repository.

This is important. Don’t think anything else. This strategy will, and can, be used for other gems with native parts. It’s just a question of time.



Current JRuby status – AKA The what’s-cool-and-happening


This is just a small update on what’s going on in JRuby development at the moment. The first and most important item is that Tom has merged his block-work to trunk. He has done a monumental achievement with a patch on over 17k lines. The most amazing thing is that my main test case for block problems (Camping) runs perfectly. Tom also reports some performance improvements, but that isn’t the important thing. What this block patch is all about is to remove the block stacks from our ThreadContext, and instead just pass the current block along through the Java stack (in other words, as a parameter to the method call in question). This simplifies many things and also make it possible for our compiler to finally compile closures and blocks. Charles will soon continue the work on making this happening, but being able to do that means the JIT will work for vastly more of the code than it does at the moment.

The second work we’re hard at right now is Rails compatibility. Tom is looking at ActiveRecord, Charles have improved Marshalling (with the side effect that the gem local source cache finally works) and also did some work on Multibyte::Chars. This work is going fast forward. My part in this is ActionPack, where we right now are down to 9 failures and 0 errors, from 23 failures and 8 errors. Those numbers are for 1150 tests and 5202 assertions, so there isn’t much left to get this working.

My work on the YARV compiler continues, and I will soon post an update about this too.