RbYAML in Google Summer of Code

Great news for all Ruby implementations around. A project to bring RbYAML up-to-date and perform better has been accepted for Google Summer of Code. Long Sun is the name of the student, and me and Xue Yong Zhi will jointly mentor this effort.

In fact, I’m very excited about this news. RbYAML was an incredibly important piece of the puzzle to get JRuby to finally work with RubyGems, and that kickstarted our possibilities to start testing numerous other applications. I soon ported RbYAML to Java, and created the JvYAML and JvYAMLb projects, to get better efficiency. Sadly, this left RbYAML without any TLC. That changed a while back when Rubinius picked up the project to get their YAML support going, and now that Long Sun will work on it, hopefully we will finally get an extremely compliant and bug free YAML implementation for Ruby.

This will obviously benefit Rubinius, but it will also be very good for both JRuby and IronRuby. The work will be test-driven which means a more complete test suite will be built around YAML in Ruby.

If you’re interested in following the project, it’s now hosted at Google Code (due to problems with RubyForge from China) at http://code.google.com/p/rbyaml/. Long Sun will also blog about his progress here: http://rbyaml.blogspot.com/.

Exciting news indeed.

YAML and JRuby – the last bit

An hour ago I sent the patches to make JRuby’s YAML support completely Java-based. What I have done more specifically, is to remove RbYAML completely, and instead used the newly developed 0.2-support of JvYAML. There were a few different parts that had to be done to make this possible, especially since most of the interface to YAML was Ruby-based, and used the slow Java proxy-support to interact with JvYAML.

So, what’s involved in an operation like this? Well, first I created custom versions of the Representer and the Serializer. (I had a custom JRubyConstructor since May). These weren’t that big, mostly just delegating to the objects themselves to decide how they wanted to be serialized. And that leads me to the RubyYAML-class, which is what will get loaded when you write “require ‘yaml'” in JRuby from now on. It contains two important parts. First, the module YAML, and the singleton methods on this module, that is the main interface to YAML functionality in Ruby. This was implemented in RbYAML until now.

The next part is several implementations of the methods “taguri” and “to_yaml_node” on various classes. These methods are used to handle the dumping, and it’s really there that most of the dumping action happens. For example, the taguri method for Object says that the tag for a typical Ruby object should be “!ruby/object:#{self.class.name}”. The “to_yaml_node” for a Set says that it should be represented as a map where the values of the set are keys, and the values for these keys are null.

So, when this support gets into JRuby trunk it will mean a few things, but nothing that is really apparent for the regular JRuby user. The most important benefits of this is part performance, and part correctness. Performance will be increased since we now have Java all the way, and correctness since I have had the chance to add lots of unit tests and also to fix many bugs in the process. Also, this release makes YAML 1.0-support a reality, which means that communication with MRI will work much better from now on.

So, enjoy. If we’re lucky, it will get into the next minor release of JRuby, which probably will be here quite soon.