Announcing RbYAML version 0.2


Another major release, with most changes in the dumper:
http://rbyaml.rubyforge.org
or to download directly:
http://rubyforge.org/frs/?group_id=1658

Changes:

  • Performance has been greatly improved
  • Rewritten the representer to use a distributed representation model
  • Much improvement of test cases
  • And many bug fixes


Announcing JvYAML.


I am pleased to announce JvYAML, version 0.1. JvYAML is a Java YAML 1.1 loader that is both easy to extend and easy to use. JvYAML originated in the JRuby project (http://jruby.sourceforge.net), from the base of RbYAML (http://rbyaml.rubyforge.org). For a long time Java have lacked a good YAML loader and dumper with all the features that the SYCK using scripting communities have gotten used to. JvYAML aims to rectify this.

Of major importance is that JvYAML works the same way as SYCK, so that JRuby can rely on YAML parsing and emitting that mirrors C Ruby.
JvYAML is a clean port of RbYAML, which was a port from Python code written by Kirill Simonov for PyYAML3000.

Simple usage:

import org.jvyaml.YAML;

Map configuration = (Map)YAML.load(new FileReader(“c:/projects/ourSpecificConfig.yml”));
List values = (List)YAML.load(“— \n- A\n- b\n- c\n”);

There is also support for more advanced loading of JavaBeans with automatic setting of properties with the use of domain tags in the YAML document.

More information:
At java.net: http://jvyaml.dev.java.net

Download: https://jvyaml.dev.java.net/servlets/ProjectDocumentList

License:
JvYAML is distributed with the MIT license.



Transforming RbYAML


RbYAML went through some big changes from release 0.0.2 to 0.1. My intentions are to detail some of these changes, what implementation choices I did, and why.

First, conversion from Mixins to Classes. The original Python implementation used multiple inheritance, created several base classes (Reader, Scanner, Parser, etc) and then created one several versions of a Loader class which inherited from the different base classes. My first implementation mirrored this approach, but used Modules instead of base classes and mixed in different versions of these in the different Loader classes. This approach was quite limiting since mixing in code into other Modules doesn’t really work as you expect, and this is no substitute for subclassing. For example, I had a BaseResolver module, a SafeResolver module which mixed in BaseResolver and added code of it’s own, but this were quite cumbersome.
The solution to this was simply to convert all Modules to class, and make all calls to the other tiers explicit. For example, instead of having the Parser module just assume that you’ve mixed in a Scanner and call check_token on itself, I have the Parser class take a Scanner instance at initialization and call check_token on this instance instead.
This works very well, and probably makes the code easier to understand. Another positive of this is that the interface between the layers are more apparent. For inclusion in JRuby, this will make it easier to replace certain parts with Java implementations.

The next piece on the agenda was a rewrite of the Parser. The original Python implementation used Python generators (which are almost like coroutines, but not quite). My first port of this code just parsed the whole stream, saved all events and then passed these on after parsing. This was good enough for smaller YAML documents, but when trying to parse the RubyGems gemspec, the memory and time requirements became to prohibitive. In the course of making the generator algorithm explicit I totally rewrote the Parser from the beginning, making it hybrid table driven instead of recursive-descent as the original was. I actually believe the new Parser is both easier to understand and faster. Just as an example, this is the code for block_sequence:

def block_sequence
@parse_stack += [:block_sequence_end, :block_sequence_entry, :block_sequence_start]
nil
end

where @parse_stack contains the next productions to call after block_sequence has finished. The main generator method just keeps calling the next production until it arrives to a terminal, and then returns the value of this:

def parse_stream_next
if !@parse_stack.empty?
while true
meth = @parse_stack.pop
val = send(meth)
if !val.nil?
return val
end
end
else
return nil
end
end

Another benefit of this is that this code is dead simple to port to other languages, once again probably easier than the Python version.

The third improvement was performance. I have no trustworthy numbers of the improvement, but it’s in the order of 5-8 times faster than from the beginning. I achieved by some easy fixes, and some harder ones. I removed the Reader class and inlined those methods into the Scanner. I tested each case where I tested if a character was part of a String and checked were a Regexp was faster. And added some hard coded, unrolled loops in the most intense parts of the code, which was peek(), forward(), prefix() and update(). Every microsecond improvement in these methods counted since they are called so many times. I didn’t do all this work blind, though. The Ruby profiler is really good. Just take a script, run it with ruby -rprofile script.rb and you get output that’s incredibly good. I tested most of my changes this way, and the end result is about as fast as the JRuby RACC-based YAML parser, which was my goal.

Since version 0.1 I’ve spent some time getting JRuby to work flawlessly with RubyGems, and this work have uncovered some small bugs in RbYAML (and in SYCK, for that matter), so a new minor release will probably come soon. Until then the CVS is up to date.



Getting RubyGems to work with JRuby


I’m sorry if the title gives it away, but here are some recent output in my terminal window:

#bin/jruby bin/gem install rails –include-dependencies
Attempting local installation of ‘rails’
Local gem file not found: rails*.gem
Attempting remote installation of ‘rails’
Updating Gem source index for: http://gems.rubyforge.org
Successfully installed rails-1.1.2
Successfully installed activesupport-1.3.1
Successfully installed activerecord-1.14.2
Successfully installed actionpack-1.12.1
Successfully installed actionmailer-1.2.1
Successfully installed actionwebservice-1.1.2
Installing RDoc documentation for activesupport-1.3.1…
Installing RDoc documentation for activerecord-1.14.2…
Installing RDoc documentation for actionpack-1.12.1…
Installing RDoc documentation for actionmailer-1.2.1…
Installing RDoc documentation for actionwebservice-1.1.2…

So, we have RubyGems mostly working. Right now there are two caveats. First, during the YAML parsing, we get some InterruptedExceptions for some reason. This doesn’t seem to impair functionality, though. The second problem is that it takes serious time. Between 30 minutes and an hour for this. The two parts that are time hogs are the YAML parsing of the Gemspec, and the RDoc stuff, for some reason.

So, what do you need to do, to get this working?

  • Start from a newly checked out JRuby.
  • Add patch for RubyTime and TimeMetaClass. (Adds gmt_offset and utc_offset. This patch can be found in the jruby-devel archives.)
  • Checkout the latest version of RbYAML from RubyForge, and put this in $JRUBY_HOME/lib/ruby/site_ruby/1.8.
  • Add the contents from the C Ruby libraries.
  • Change fileutils.rb, so that RUBY_PLATFORM works.
  • Replace the file $JRUBY_HOME/src/builtin/yaml.rb with the yaml.rb for RbYAML, that can be found here.
  • Change the jruby and jirb scripts by adding -Xmx512M. (I’m not sure 512 is really needed, actually. Maybe 256 or 128 suffices.)

And this should be everything that’s needed to get the same results as me when trying to install Rails (provided you’ve got the patience).



RbYAML version 0.1.0 released


Version 0.1.0 of RbYAML has now been released. Most of the interesting work on this was done on the flight from San Francisco and JavaOne to Stockholm. I guess I got tired of all the Java code. Anyhow, this is a major release, which improves almost all areas, with better testing, more functionality, Ruby-fied code, a new parser, and huge performance improvements.

I will take some time later this week to write more about the things I have done, implementation-wise.



RbYAML version 0.0.2 released.


I have released version 0.0.2 of RbYAML. This is mostly fixes and convergence to the current PyYAML codebase, so nothing revolutionary. There are some things working now, that didn’t before. I’ve also added some more automated tests.

The code can be downloaded here.



JavaOne, last day.


So. The last day of JavaOne is always a strange experience. Most people are often to tired to stand straight after 3 really intense days of information gathering and people interactions. Personally, I was to tired to go to all sessions, but I managed the general session with Gosling and McNealy, the Mustang scripting session and the one about writing good API’s.

All three were worthwhile. Gosling showcased some really amazing toys, as usual. The Mustang scripting session was interesting, mostly so because it seems they’ve ripped some parts of the Rhino JavaScript engine out, for some reason.

The best session today was the one on writing good API’s, though. It had som really interesting advice and tips about API design. Basically you should apply the same rules as when you’re doing UI design.

After this, I went to the JRuby meetup, where we sat around talking for a few hours, until I felt the need to go home and pack. JRuby is really on the go now, we have momentum and some really cool stuff almost finished. Stay tuned.



JavaOne, day 3.


So, the third day of JavaOne has also featured some interesting presentations. My blog today will not be a blow-for-blow description of these, but more a few interesting tidbits I noticed during the day.

I managed to talk to Gilad Bracha about how I thought his proposal for super packages looked very inspired by Common Lisp packages, and his response was that it was an interesting observation. He hadn’t thought that way consciously until I pointed it out, so it was not designed that way, but he said that it was a good sign for the proposal that it looked like Common Lisp packages.

Actually, the first session was probably the most interesting from my perspective. This was Gilad s talk about supporting dynamically typed languages on the JVM. The first part talked about invokedynamic, which is fairly straightforward. The only new information I got about this area was that they’re thinking about adding handlers for cases where the JVM can’t discern a correct overloaded method to call for a dynamic invocation. In reality, this would more or less be a method_missing, available directly on the JVM, with all the performance characteristics you can get from the JIT. Nice stuff. Probably the handler architecture could also be used to implement some variations of multiple inheritance and mixins, which also is a problem to do efficiently on the JVM.

The second part of his talk was about hotswapping, which I didn’t even know they’re trying to get into the JVM. Basically hotswapping is what enables eval and replacing, adding and removing methods and types at runtime. This seems to be a very hard problem, but Gilad had some ideas, so it looks promising. It seems that JRuby may actually be able to run completely in JVM bytecode sometime in the future. Very cool.

After this I want to a session about simplifying enterprise development with scripting. This turned out to not match the title; it was basically another presentation on Groovy, and nothing much more.

The session on Compiler Optimizations where really interesting, and full of the kind of vocabulary that makes your head spin (but for different reasons if you’re a compiler head or just a regular geek).

The Harmony session where really cool, they actually have a working (but slow) Swing implementation. The demonstration showed JEdit running inside Harmony, which is nice.

The security traps session was mostly basic material. Nothing new at all if you’ve been reading the books.

The last session for me today was about good ways to both an enterprise application. This presentation was really great, one of the top 3 this JavaOne, and I’m definitely planning on going home to study the slides. (It was TS-5397 if anyone wasn’t there). Great stuff, really.

So, the rest of the evening will be After Dark Bash, and then out to make San Francisco unsafe.



JavaOne, day 2, second part.


So, the second part of day two was composed of a few different BOF’s. I won’t bother to talk about them all separately, since there really wasn’t that much information in them.

First of all I went to the Collections Connection, which is always fun. Josh had most responsibility still, even though he’s officially at Google now. They talked about the new collections in Mustang, of which the Deque interface is the most important addition. Also, navigable collections have been added. This is more or less SortedSet and SortedMap done right, with navigability from all ways.

My second BOF talked about identity management and federation. I really didn’t get much out of this presentation. The presenter showcased a few standards that should be used, and some fairly complicated graphics showing how to interconnect these data transport protocols. Most of the stuff focused on SAML 2.0, XACML and ID-FF.

After that there was the BOF on Java Language and Compiler Issues, where they talked a little about the new compiler API in Mustang. The new packages javax.tools, javax.lang.model and com.sun.source seems really interesting and usable to do neat stuff. Another cool thing they showed was something called the JavacViewer, which more or less gives access to most information that the different compiler types uses internally. Parse trees, annotation processing, internal labeling; it’s all there. Very cool.

Last, but not least at all, the late night BOF called “A script for more powerful Java technology-based applications” which talked about how you can leverage different scripting technologies to add a different interface to your application in a few different ways, by providing plugin possibilities, as a way of adding new features quickly, and also to make macros for getting your power users happy. The presenter used different kinds of scripting to demonstrate these techniques. Some parts integrated BeanShell, and a big part of the demonstration talked about how to write your own domain specific language, and a parser and definition for this. As the session was late at night, and there were fairly few people attending, it tended to drift to different subjects depending on questions from the audience, but this didn’t detract at all. It was mostly very interesting and one of the better sessions this JavaOne.

One of the best reasons and rationales for adopting scripting languages as an approach is for your own developer needs. It makes sense to add scripting support so you can explore a huge code base, test out corner cases easily. (I know I constantly do this, start up JRuby or BeanShell inside Emacs, and test something there before using it in a real Java application).

After this session, me, Pop, Bob Evans, Charles Nutter, Thomas Enebo (the JRuby guys) and a few other went to a pub, drank some beer and continued talking scripting, JRuby, Lisp and other cool stuff for some parts of the night. I’ve learn some very neat stuff, and we’ve talked some more about the future for implementing RubyGems in JRuby. It will be very soon.



JavaOne, day 2, first part.


So, this day I’ve been trying to keep my notes more close to the final result seen in this blog, with the result that I’ll actually be able to post information even before the day is over. So, what I’m posting now is information from the beginning of the day, to the JRuby session that ended at 5pm.

Effective Java Reloaded
Effective Java has not been reloaded. Or not yet at least. But there is much material that can be used, and the session went through some great stuff. The presentation were divided into three parts, Object Creation, Generics and Other.

So, the object creation part had some great patterns. The first regarded static factories and how you can use factory methods to improve creation of
generic instances. For example, take this horrible example of creating a HashMap:

Map<String, List<String>> m = new HashMap<String,List<String>>();

Instead, HashMap should have a factory method, and then you can do this:

Map<String, List<String>> m = HashMap.newInstance();

The recommendation is to always write your generic code like this.

There are a few disadvantages that both static factories and constructors share. A big one is optional parameters. There are many ways of solving this, but none good. The pattern to fix this is to use a variation of the builder pattern.
You create a static Builder nested class, this builder constructor takes all required parameters and then provides setters for all optional parameters. It also exposes a build method that returns a created object. An example:

final NutritionFacts twoLdietCoke = new NutritionFacts.Builder(“Diet Coke”,240,8).sodium(1).build();

or even

final NutritionFacts twoLdietCoke = NutritionFacts.builder(“DietCoke”,240,8).sodium(1).build();

This approach is really powerful. If we’re lucky this interface may be added to the JDK in the future:

public interface Builder<T> {
T build();
}

Then we could stop passing Class objects around, and use the typesafe Builder instead.

The generic part of the session had some interesting information that was new to me, at least.
The first recommendation was to never use raw types anymore. Those are only for legacy code. Raw types are really evil.
You should never ignore compiler warnings. They should be understood and eliminated if possible. If not they should be commented, and suppressed with the SuppressWarnings annotation if it can be proved safe.

Wildcards should be preferred to explicit type parameters. In many cases this makes method signatures clearer, and you don’t have
to manage a type variable. The exception to this is conjunctive types (which is really neat too).

Bounded wildcards are almost always better to use in your API, it will make it work for many more cases where people expect it to work.
The usual case when this is a problem is when you’re using generics of generic types in your code. The reason this is a problems is that for example Collection<Integer> is NOT a subtype of Collection<Number>.

Bounded wildcards should never be a return type. This forces clients to deal with wildcards explicitly. Only library designers should use wildcards.
Sometimes you actually need to do it, but it’s very unlikely.

Generics and arrays don’t mix very well, mostly always use generics if you can.
Some people say avoid arrays altogether, but there are cases where arrays are both prettier and faster.

Finally, the presentation ended with a few various recommendations.

Use the @Override annotation. This avoids common problems when you think you’re overriding something, but really isn’t, for example equals or hashCode.

Final should be used everywhere, except where there really is a reason to not do that. This minimizes mutability and is clearly thread-safe, which means you have one less thing to worry about. The only problem is readObject and clone, so take care with these.

You can use a HashMap makes a fine sparse array, with generics and autoboxing.

The Serialization Proxy pattern is really neat.
Since serialization depends on implementation details you should take care with serialization.
The pattern solves these problem by having you create a new class representing the logical state of your object, and you just use writeReplace and readResolve to use this proxy to serialize your object in an implementation independent way.

Java Puzzlers
There were some really intriguing things showcased here, and everything was Tiger-oriented. I didn’t take any notes, since I had way to much fun. But I definitely recommend everyone to have a look at the presentation slides.

Super packages
Gilad Bracha had a small session about the new super packages proposed for Dolphin. As he constantly told us, nothing of this is really ready or finished. The JCP process will hash everything out later.

There is really two processes going on for modularity. One is for super packages, and regards the language changes necessary for this functionality. The other part is a module approach for packaging and distribution. The packaging has nothing to do with the language. The packaging only concerns tools and environment, more or less.

The problem with current packages concern information hiding and encapsulation. There are really hard to do this in a good way in current Java. A few solutions have been proposed for this, that are easier than a real language change.
* Don’t document unexposed API
* Using static classes to provide access control to different classes
* Make a small language change that makes packages nested
The conclusion is that these doesn’t suffice. They are not good enough, and very hackish solutions.

A real solution will solve the packaging problem, provide encapsulation and also allow separate compilation. All this will use separate module files for changing the semantics of a program, but still having the default way for modularity to look like current Java, for providing backwards compatibility. This is also the reason annotations won’t be used for this, since it would change runtime semantics of a program, which annotations should not do.

To my eyes, the syntax and semantics Gilad showed us reminds me very much of Common Lisp packages.

Spring WebFlow
Classical web packages use free navigation, stateless systems. This is not always perfect. Some business scenarios are better represented with a controlled flow of actions. Traditionally this hasn’t been the focus of Web tools. Instead, most of the current frameworks focus on providing easy to use solutions for the base case of free simple navigation. There are a few reasons for this, but the simplest reason is that controlled flow is really hard to get right.

In my opinion, WebFlow is a perfect example on how you should not solve this problem. It has the right ideas, but doesn’t go far enough.

The idea in Spring WebFlow is basically to describe states and state progressions either declaratively with XML, or programmatically in code. When you’ve done this, WebFlow takes care of most boring stuff, like state and back buttons. It’s really about inverting control to the controller, instead of having the client provide parameters that the web server uses to find out where in the flow they are.

This approach is a really good solution to the problem, but it doesn’t go far enough, if you ask me. When I see executable XML I always get scared, and this case is no exception. Spring WebFlow seems to be more or less a (very) poor mans continuation server. Since you can actually have real continuation servers in Java, using an embedded script language like JavaScript or Ruby, this approach isn’t good enough for me.

Groovy
Groovy is like Java with some Python, Ruby and Smalltalk. It’s object oriented and completely Java compatible. It has iterators, code blocks (closures), and many, many DWIM hacks.

Since the JVM is standardized and more general than Java, it can be used to innovate at the source code level. There are many scripting languages for Java.
Scripting seems to be a good way to glue business code together, since you really don’t have that much business code in reality. There also is a drive to test code
with dynamic languages. So scripting is just great glue; it works like programmer duct tape.

The reason for Groovy is to have something Java developers will instantly recognize. Complete binary compatibility with Java, and possibilities to use Java without wrappers and cumbersome API’s.

Groovy is basically dynamic, but also supports static typing. There is native support for lists, maps, arrays and beans..
Regexps are also part of the language. There exists some operator overloading, but nothing really lethal. Groovy also adds lots of convenience methods to the JDK, for example lots of new String-methods.

It has BSF support.

There seemed to me to exist some really hairy magic, which means that it’s very hard to know exactly what’s going on under the covers. A typical example was the actionPerformed parameter to some of the swing builders, which found an ActionListener interface and found the method inside that, and implemented this interface with a closure added, via 3 or 4 levels of indirection.

In conclusion, Groovy looks good on the surface, but beneath, it feels very much like Perl (in the negative sense).

JRuby
JRuby showcased many fun things, the best one was JRuby on Rails actually running. Is that cool or what? A part from that, the talk was mostly aimed at people new to Ruby and JRuby. It was interesting to see that most of the people at the session hadn’t heard about either Ruby or Rails one year ago! Major impact or what?

That was my day, to now. I’m off to the Java Certified Professional party! Part two comes later. Maybe much later depending on how much free drinks there are at the party.