Injecting loggers using Spring


On my current project we are using Spring MVC and we try to use autowiring as much as possible. I personally strongly prefer constructor injection, since this gives me the luxury of working with final fields. I also like being able to inject all things a class needs – including loggers. Most of the time I don’t really want to use custom loggers from tests, but sometimes I do want to make sure something gets logged correctly, and being able to inject a logger seems like a natural way of doing that. So, with that preamble out of the way, my problem was that this seemed quite hard to achieve in Spring. Specifically, I use SLF4J, and I want to inject the equivalent of doing LoggerFactory.getLogger(MyBusinessObject.class). Sadly, Spring doesn’t give access to the place where something is going to be injected in any of the hooks available. Most solutions I found to this problem relies on using a BeanPostProcessor to set a field on the object after it’s been created. This defeats three of my purposes/principles – I can’t use the logger in the constructor, the field will be mutable and I won’t get told by Spring if I’ve made a mistake in my wiring.

There was however one solution I found in a StackOverflow post – sadly it wasn’t complete. Specifically, I needed to use it in a Spring MVC setting and also from inside of tests. So this blog post is mainly to provide the complete solution for something like this. It’s a simple problem, but it was surprisingly tricky to get working correctly. But now that I have it, it will be very convenient. This code is for Spring 3.1, and I haven’t tested it on anything else.

The first part of this injection is to create our own custom BeanFactory – which is what Spring uses internally to manage beans and dependencies. The default one is called DefaultListableBeanFactory and we will just subclass it like this:

public class LoggerInjectingListableBeanFactory 
                extends DefaultListableBeanFactory {
    public LoggerInjectingListableBeanFactory() {
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
           new QualifierAnnotationAutowireCandidateResolver());
    }

    public LoggerInjectingListableBeanFactory(
              BeanFactory parentBeanFactory) {
        super(parentBeanFactory);
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
            new QualifierAnnotationAutowireCandidateResolver());
    }

    @Override
    public Object resolveDependency(
               DependencyDescriptor descriptor, String beanName, 
              Set<String> autowiredBeanNames, TypeConverter typeConverter) 
                     throws BeansException {
        Class<?> declaringClass = null;

        if(descriptor.getMethodParameter() != null) {
            declaringClass = descriptor.getMethodParameter()
                    .getDeclaringClass();
        } else if(descriptor.getField() != null) {
            declaringClass = descriptor.getField()
                    .getDeclaringClass();
        }

        if(Logger.class.isAssignableFrom(descriptor.getDependencyType())) {
            return LoggerFactory.getLogger(declaringClass);
        } else {
            return super.resolveDependency(descriptor, beanName, 
                    autowiredBeanNames, typeConverter);
        }
    }
}

The magic happens inside of resolveDependency where we can figure out the declaring class by checking either the method parameter or the field – and then see whether the thing asked for is a Logger. Otherwise we just delegate to the super implementation.

In order to use this from anything we need an actual ApplicationContext that uses it. I didn’t find any hook to set the BeanFactory after the application context was created, so I ended up creating two new ApplicationContext implementations – one for tests and one for the Spring MVC purpose. They are slightly different, but try to do so little as possible while retaining the behavior of the original. The application context for the tests look like this:

public class LoggerInjectingGenericApplicationContext 
                    extends GenericApplicationContext {
    public LoggerInjectingGenericApplicationContext() {
        super(new LoggerInjectingListableBeanFactory());
    }
}

This one just calls the super constructor with an instance of our custom bean factory. The application context for Spring MVC looks like this:

public class LoggerInjectingXmlWebApplicationContext 
                    extends XmlWebApplicationContext {
    @Override
    protected DefaultListableBeanFactory createBeanFactory() {
        return new LoggerInjectingListableBeanFactory(
                    getInternalParentBeanFactory());
    }
}

The XmlWebApplicationContext doesn’t have a constructor that takes a bean factory, so instead we override the createBeanFactory method to return our custom instance. In order to actually use these implementations some more things are needed. In order to get our tests to use it, a test.context.support.ContextLoader implementation is necessary. This code is mostly just copied from the default implementation – sadly it doesn’t provide any extension points and the place I want to override are in the middle of two final methods. It feels quite ugly to just copy the implementations, but there are no hooks for this…

public class LoggerInjectingApplicationContextLoader 
                        extends AbstractContextLoader {
    public final ApplicationContext loadContext(
     MergedContextConfiguration mergedContextConfiguration) 
                                  throws Exception {
        String[] locations = mergedContextConfiguration.getLocations();
        GenericApplicationContext context = 
                  new LoggerInjectingGenericApplicationContext();
        context.getEnvironment().setActiveProfiles(
               mergedContextConfiguration.getActiveProfiles());
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    public final ConfigurableApplicationContext 
            loadContext(String... locations) throws Exception {
        GenericApplicationContext context = 
              new LoggerInjectingGenericApplicationContext();
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    protected void loadBeanDefinitions(
            GenericApplicationContext context, String... locations) {
        createBeanDefinitionReader(context).
               loadBeanDefinitions(locations);
    }

    protected BeanDefinitionReader createBeanDefinitionReader(
                      final GenericApplicationContext context) {
        return new XmlBeanDefinitionReader(context);
    }

    @Override
    public String getResourceSuffix() {
        return "-context.xml";
    }
}

The final thing necessary to get your tests to use the custom Bean Factory is to specify the loader to use in the ContextConfiguration on your test class, like this:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(value = "file:our-app-config.xml",
          loader = LoggerInjectingApplicationContextLoader.class)
public class SomeTest {
}

In order to get Spring MVC to pick this up, you can edit your web.xml and add a new init-param for the DispatcherServlet, like this:

    <servlet>
        <servlet-name>Spring MVC Dispatcher Servlet</servlet-name>
        <servlet-class>
           org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>WEB-INF/our-app-config.xml</param-value>
        </init-param>
        <init-param>
            <param-name>contextClass</param-name>
            <param-value>
               com.example.LoggerInjectingXmlWebApplicationContext
            </param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

This approach seems to work well enough. Some of the code is slightly ugly and I would definitely love to have a better hook for injection points to know where it will get injected. Having factory methods be able to take the receiver object might be very convenient, for example. Being able to customize the bean factory seems like it also should be much easier than this.



The JVM Language Summit 2010


I’ve just come back from three days in Santa Clara, spending time with some of the brightest people in the Java world – the JVM language summit is truly a fantastic collection of great people. And I was there too…

THe goal of the JVM language summit is to collect the people who work with languages on the JVM and have them share their projects, their experiences and their networks – and let them network with the people in charge of implementing the JVM’s for different companies. This year, a lot of discussion about JSR 292 and project lambda was on the plate. The presence of hardware and VM people was also more pronounced. I counted principals for at least six different virtual machines in the audience or presenting (Hotspot, JRockit, J9, Azul, Maxine, and Monty).

Among the experienced platform and language people there, some of the notables included Kresten Krab Thorup, Joshua Bloch, Bob Lee, Neal Gafter, John Rose, Brian Goetz, Alex Buckley, Rich Hickey, Charles Nutter, Cliff Click, Doug Lea, Per Bothner and many more. A great collection of people.

As an example of the funny happenstance that can happen in this collection of people, I was sitting rebinding my Java implementations for Mac OS X – and I had remove lots of links in /usr/bin. A few minutes later the person next to me started asking some questions about my experience with Java on the Mac – and it turns out he’s the manager for the Apple JVM team. Or at one point Rich Hickey reported on a quite puzzling problem that causes bad semantics when iterating over data that doesn’t fit in memory – and Cliff Click immediately opens up his laptop, says “give me an half hour and I’ll see what I can do”.

Another funny anecdote was when Doug Lea pointed out that if you use fibonacci to test performance against yourself or others, it’s important that the implementations actually agrees about the first values of fib. Funnily enough, I saw three different implementations of the ground rule in fib during the summit – all of them different. (if n < 2 return 1, if n<=2 return n, if n < 2 return n).

There were way too many interesting presentations and discussions for me to be able to talk about all of them – instead I just wanted to give some highlights.

Charles Nutter

Charles gave a quick introduction to JRuby and Mirah, and what kind of optimizations JRuby is currently doing. He also talked about how far he’s gotten in inlining invoke dynamic calls inside of JRuby (and he’s gotten very far – it’s really cool).

Fredrik Öhrström

Fredrik is the JRockit representative on JSR 292, and way too smart. He presented a solution to how you can use method handles integrated with function types to solve many of the current problems in project lambda. A very powerful and interesting presentation.

Doug Lea

Doug spent his keynote trying (quite successfully) to concinve the room of the hegemony of fork-join as a good solution to concurrency problems. A very good and thought provoking keynote.

Josh Bloch

Last year at the JVM language summit, Josh talked about what he called “the Semantic Gap”. This year, after being beat up by some linguists, he changed the name of this concept to “Performance Anxiety”. The basic idea is that in our current infrastructure we have traded performance for predictability. Two examples from his talk about when this happens in Java was pretty interesting. He had one benchmark that consistently showed about the same numbers for the same JVM run, but differed between JVM runs. There was no undeterminism in the benchmark itself, but they benchmark times continued to oscillate between 0.7 and 0.85 depending on JVM run. Cliff Clicks explanation for this is that it is probably the compilation planner, which is a separate thread. Depending on when that thread runs the compilation strategy will be different, and makes a difference in times. And it’s really hard for the programmer to take this difference into account.

The other example is simpler (and don’y change your code because of this). In some circumstances it turns out that & is faster than && in Java, because a && will short curcuit, which means it will branch. The single ampersand will always execute both sides, which means the CPU can pipeline both of them to execute at the same time.

All the examples he shown comes down to the same thing – we can’t really reason intuitively about the performance of our language constructs anymore. Our systems have become to complex in order to support better performance, and we give up predictability to get that performance. And at the end of the day it doesn’t even matter if you go down to C or assembler – you still cannot control exactly what the CPU is doing anymore.

Kresten Krab Thorup

Kresten is the CTO of Trifork, and one of the main organizers of many of my favorite conferences (like JAOO and QCon). The last nine months he has worked on an Erlang implementation for Java, which he talked about. It seems to be a very good implementation, and he’s getting surprisingly good performance and context switching numbers. In fact, several of the ideas in Seph will be stolen from Erjang.

Rémi Forax

Rémi showed off his PHP.reboot project, implemented using JSR 292 and getting quite good performance. His JSR 292 backport seems to be really useful and I think I’ll use that to make sure Seph can run on pre Java 7 machines. Good stuff.

Rich Hickey

Rich spent some time collecting comments from people in the room of what was problematic with the JVM in its current incarnation. To start us off, he showed one piece of hilarious/horrible Clojure code. Any one wants to guess what it does?

static public Object ret1(Object ret, Object nil) {
    return ret;
}

public static int count(Object o){
    if(o instanceof Counted)
        return ((Counted) o).count();
    return countFrom(Util.ret1(o, o = null));
}

We then went on to a few other things (which you can find on the JVM Language Summit wiki). The consensus seemed to be that tail calls is really very important. Last year, it wasn’t as crucial but now that we see how powerful method handles and lambda will be, tail calls turn out to be very nice to have. Hopefully we can make that happen.

JSR 292

The JSR 292 expert group got lots of chances to work on ideas and designs for the future. Lots of interesting results came out of these discussions. Some of the more notable ones are skisses on how method handles and function types can work together, how invoke dynamic and bootstrap method can be used to implement defender methods and several other interesting ideas.

All in all it has been a fun few days, going far out in language and implementation geekiness. I hope to come back to this next year.



Life in the time of Java 7


I’m currently in the process of implementing Seph, and I’ve reached an inflection point. This point is the last responsible moment to choose what I will target with my language. Seph will definitely be a JVM language, but after that there is a range of options – some quite unlikely, some more likely. The valid choices are:

  • Target Java 1.4
  • Target Java 5/6
  • Target Java 7
  • Target Java 7 with extensions

Of these, the first options isn’t really interesting for Seph, so I’ll strike it out right now. The other three choices are however still definitely possible – and good choices. I thought I might talk a little bit about why I would choose each one of them. I haven’t made a final decision yet, so that will have to be the caveat for this post.

Before talking about the different choices, I wanted to mention a few things about Seph that matters to this decision. The first one is that I want Seph to be useful in the real world. That means it should be reasonably fast, and runnable for people without too much friction. I want the implementation to be small and clean, and hopefully as DRY as possible – if I end up with both and interpreter and just-in-time compiler, I want to be able to share as much of these implementations as possible.

Java 5/6

The easiest way to go forward would be to only use Java 5 or 6. This would mean no extranice features, but it would also mean the barrier to entry would be very low. It would mean development on Seph would be much easier and wouldd in general make everything simpler for everyone. The problem with it would mainly be implementation complexity and speed, which would both suffer compared to any of the Java 7 variants.

Java 7

There are many good reasons to go with Java 7, but there are also some horrible consequences of doing this. For Seph, the things that would make things from Java 7 is method handles, invoke dynamic and defender methods. Other things would be nice, but the three previous ones are the killer features for Seph. Method handles make it possible to write much more succinct code, not generate lots of extra classes for each built in method, and many other things. It also becomes possible to refer to compiled code using method handles, so the connection between the JIT and the interpreter would be much nicer to represent.

Invoke dynamic is quite obvious – it would allow me to do much nicer compilation to bytecode, and much faster. However, I could still build the same thing myself, to much greater cost and it would also mean inlining wouldn’t be as easy to get.

Finally, defender methods is a feature of the new lambda proposal that allow you to add new methods to interfaces without breaking backwards compatibility. The way this works is that when you add a new method to an interface, you can specify a static method that should be called when that interface method is invoked and there are no other implementations on the concrete classes for a specific object. But the interesting side effect of this feature is that you can also use it to specify default implementations for the core language methods without depending on a shared base class. This will make the implementation much smaller and more flexible, and might also be useful to specify required and optional methods in an API.

The main problem with Java 7 is that it doesn’t exist yet, and the time schedule is uncertain. It is not entirely certain exactly what the design of the things will look like either – so it’s definitely a moving target. Finally, it will make it very hard for people to help out on the project, and also it won’t make Seph a possible language for people to use until they upgrade to Java 7.

Java 7 with extensions

It turns out that the interesting features coming in Java 7 is just the tip of the iceberg. There are many other proposed features, with partial implementations in the DaVinci project (MLVM). These features aren’t actually complete, but one way of forcing them to become more complete is to actually use them for something real and give lots of feedback on the feature. Some of the more interesting features:

Interface injection

This feature will allow you to say after the fact that a specific class implements an interface, and also specify implementations for the methods on that interface. This is very powerful and would be extremely helpful in certain parts of the language implementation – especially when doing integration with Java. The patch is currently not very complete, though.

Tail calls

Allowing the JVM to perform proper tail calls would make it much easier to implement many recursive functional algorithms easily. Since Seph will have proper tail calls in the language, this will mean that I will have to implement this myself if the JVM doesn’t do it, which means Seph will be slower based on this. The patch seems to be quite good and possible to merge and harden to the JDK at some point. Of all the things on this list, this seems to be one of things that we can actually envision see being added in the Java 7 or Java 8 time frame.

Coroutines/continuations

Both coroutines and continuations seem to be possible to do in a good way, at least partially. Coroutines might be interesting for Seph as an alternative to Kilim, but right now it seems to be a bit unstable. Continuations would allow me to expose continuations as a first class citizen which is never bad – but it wouldn’t give me much more than that.

Hotswapping

Hotswapping of code would make it possible to do agressive JITting and then backing out from that when guards fail and so on. This is less interesting when we have invoke dynamic, but will give some more flexibility in terms of code generation.

Fixnums, tuples, value types

We all want ways of making numbers faster – but these features might also make it possible to efficiently represent simple composite data structures, and also things like multiple return values. These are fairly simple features, but have no real patch right now (I think).

Light weight code loading (anonymous classes)

It is horrible to load byte code at runtime in Java at this point. The reason is that to be able to make sure your loaded code gets garbage collected, you will have to load each chunk of code in a new class in a new classloader. This becomes very expensive very fast, and also endangers permgen. Anonymous classes make this go away, since they don’t have names. This means you don’t actually have to keep a reference to older classes, since there is no way to get to them again if you lost the reference to them. This is a good thing, and makes it possible to not generate class loaders every time you load new code. THe state of this seems to be quite stable, but at this point JVM dependent.

The price

Of course, all of these lovely features comes with a price. Two prices in fact. The first price is that all the above features are incomplete, ranging from working patches to proof of concepts or sketches of ideas. That means that the ground will change under any language using it – which introduces hard version dependencies and complicates building. The other price is that none of these features are part of anything that has been released, and there are no guarantees that it will ever be merged in Java at any point. So the only viable way of distributing Seph would be to distribute standard build files with a patched OpenJDK so that anyone can download and use that specific JDK. But that limits interoperability and causes lots of other problems.

Somewhere in between

My current thinking is that all of the above choices are bad. For Seph I want something inbetween, and my current best approach looks like this. You will need a new build of MLVM with invoke dynamic and method handles to develop and compile Seph. I will utilize invoke dynamic and method handles in the implementation, and allow people to use Rémi Forax’ JSR 292 backport to run it on Java 5 and 6. When Java 7 finally arrives, Seph will be more or less ready for it – and Seph can get some of the performance and maintainability benefits of using JSR 292 immediately. At this point I can’t actually use defender methods, but if anyone is clever enough to figure out a backport that will allow defender methods to work on Java 5 or 6, I would definitely use them all over the place.

This doesn’t actually preclude the possibility of creating alternative research versions of Seph that uses some of the other MLVM patches. Charles Nutter have shown how much you can do by using flags to add features that are turned off by default. So Seph could definitely grow the above features, but currently I won’t make the core of the language depend on them.



Questioning the reality of generics


I’ve been meaning to write about this for a while, since I keep saying this and people keep getting surprised. Now maybe I’m totally wrong here, and if that’s the case it would be nice to hear some good arguments for that. Here’s my current point of view on the subject anyway.

A specter is haunting the Java community – the specter of generics.

Java introcued a feature called generics in Java 5 (this feature is generally known under the name of parametric polymorphism in the literate). Before Java 5 it wasn’t possible to create a reusable collection that would ensure the type safety at compile time of what you put in to that collection. You could create a collection of for example Strings and have that working correctly, but if you wanted to have a collection of anything, as long as that anything was the same type, you were restricted to doing runtime checks, or just having good tests.

Java 5 made it possible to add type parameters to any other type, which means you could create more specific collections. There are still problems with these – they interact badly with native arrays for example, and wildcards (Java’s way of implementing co= and contravariance) have ended up being very hard for Java developers to use correctly.

Java and C# both added generic types at roughly the same time. The C# version of generics differed in a few crucial ways, though. The most important difference in implementation is that C# generics are reified, while Java generics use type erasure. And this is really the gist of this blog post. Because over and over I hear people lament the lack of reified generics in Java, citing how good C# and the CLR is to have this feature. But is that really the case? Is reified generics a good thing? Of course, that always depends on who is asking the question. Reified might well be good for one person but not another. Here you will hear my view.

Reified? Huh?

So what does reified generics mean, anyway? It is probably easiest to explain compared to the Java implementation that uses type erasure. Slightly simplified: in Java generics doesn’t exist at runtime. It is purely a fiction that the compiler uses to handle type checking and make sure you don’t do anything bad with your collection. After the generics have been type checked, they are used to generate casts and type checks in the code using generics, some metadata is inserted into the class file format, and then the generic information is thrown away.

In contrast, on the CLR, generic classes exist as specific versions of their class. The same class with different generic type arguments are really different classes. There are no casts happening at the implementation level, and the CLR will as a result generate more specific code for the generic code. Reflection and dynamic type checks is also possible on the CLR. Having reified generics means basically that they exist at runtime, that the virtual machine knows about them and handles them correctly.

Multi-language virtual machines

The last twenty years something interesting has happened. Both out hardware and software has gotten mature enough that a new generation of virtual machines have entered the market. Traditionally, virtual machines for languages were made for specific languages, such as Pascal, Lisp and Smalltalk, and possibly except for SECD and the Warren machine, there haven’t really been any virtual machines optimized to running more than one language well. The JVM didn’t start that way either, but it turned out to be more well suited for it than expected, and there are lots of efforts to make it an even better platform. The CLR, Parrot, LLVM and Rubinius are other examples of things that seem to become environments rather than just implementation strategies for languages.

This is very exciting, and I think it’s a really good thing. We are solving very complex problems where the component problems are best solved in different ways. It seems like a weird assumption that one programming language is the best way of solving all problems. But there is also a cost associated with using more than one language. So having virtual machines act as platforms, where a sharked chunk of libraries are available, and the cost of implementation is low, makes a lot of sense.

In summary, I feel that the JVM was the first step towards a real viable multi-language virtual machine, and we are currently in the middle of the evolution towards that point.

Solving the problems

So why not add reified generics to the JVM at this point? It could definitely be done, and using an approach similar to the CLR, where libraries are divided into pre and post reified makes the path quite simple from an implementation standpoint. On the user side, there would be a new proliferation of libraries to learn – but maybe that’s a good thing. There is a lot of cruft in the Java standard libraries that could be cleaned up. There are some sticky details, like how to handle the API’s that were designed for erased generics, but those problems could definitely be solved. It would also solve some other problems, such as making it possible for Scala to pattern match on type parameters and solving part of the problem with abstracting over primitive types. And it’s absolutely possible to do. It would probably make the Java language into a better language.

But is it the only solution? At this point, making this kind of change would complicate the API’s to a large degree. The reflection libraries would have to be completely redesigned (but still kept around for backwards compatibility). The most probable result would be a parallel hierarchy of classes and interfaces, just like in the CLR.

Refified generics are generally being proposed in discussions about three different things. First, performance, second, making it easier for some features in Scala and other statically typed languages on the JVM, and thirdly to handle primitives and primitive arrays a bit better. Of these, the first one is the least common, and the least interesting by far. JVM performance is already nothing short of amazing. The second point I’ll come back to in the last section. The third point is the most interesting, since there are other solutions here, including unify primitives with objects inside the JVM, by creating value types. This would solve many other problems for language implementors on the JVM, and enable lots of interesting features.

The short stick

I believe in a multi language future, and I believe that the JVM will be a core part of that future. Interoperability is just too expensive over OS boundaries – you want to be on the same platform if possible. But for the JVM to be a good environment for more than one language, it’s really important that decisions are made with that in mind. The last few years of fantastic progress from languages like Rhino, Jython, JRuby, Groovy, Scala, Fantom and Clojure have shown that it’s not only possible, but benificial for everyone involved to focus on JVM languages. JSR 223, 292 and several others also means the JVM is more and more being viewed as a platform. This is good.

Generics is a complicated language feature. It becomes even more complicated when added to an existing language that already has subtyping. These two features don’t play very well together in the general case, and great care has to be taken when adding them to a language. Adding them to a virtual machine is simple if that machine only has to serve one language – and that language uses the same generics. But generics isn’t done. It isn’t completely understood how to handle correctly and new breakthroughs are happening (Scala is a good example of this). At this point, generics can’t be considered “done right”. There isn’t only one type of generics – they vary in implementation strategies, feature and corner cases.

What this all means is that if you want to add reified generics to the JVM, you should be very certain that that implementation can encompass both all static languages that want to do innovation in their own version of generics, and all dynamic languages that want to create a good implementation and a nice interfacing facility with Java libraries. Because if you add reified generics that doesn’t fulfill these criteria, you will stifle innovation and make it that much harder to use the JVM as a multi language VM.

I’m increasingly coming to the conclusion that multi language VM’s benefit from being as dynamic as possible. Runtime properties can be extracted to get performance, while static properties can be used to prove interesting things about the static pieces of the language.

Just let generics be a compile time feature. If you don’t there are two alternatives – you are an egoist that only care about the needs of your own language, or you think you have a generic type system that can express all other generic type systems. I know which one I think is more likely.



Re2j – a small lexer generator for Java


There is a tool called re2c. It’s pretty neat. Basically it allows you to intersperse a regular expression based grammar in comments inside of C code, and those comments will be transformed into a basic lexer. There are a few things that make re2c different from other similar tools. The first one is that the supported features are pretty limited (which is good). The code generated is fast. The other good part is that you can have several sections in the same source file. The productions for any specific piece of code are constrained to the specific comment.

As it happens, why the lucky stiff used re2c when he made Syck (the C-based YAML processor used in Ruby and many other languages). So when I set set out to port Syck to Java, the first problem was to figure out the best way to port the lexers using re2c. I ended up using Ragel for the implicit-scanner, and thought about doing the same for the token scanner, but Ragel is pretty painful to use for more than one main production in the same source file. The syntax is not exactly the same either, so it would add to the burden of porting the scanner if I decided to switch.

At the end of the day the most pragmatic choice was to port the output generator in re2c to generate Java instead. This turned out to be pretty easy, and the result is now used in Yecht, which was merged as the YAML processor for JRuby a few days ago.

You can find re2j in my github repository at http://github.com/olabini/re2j. This is still a C++ program, and it probably won’t compile very well on windows. But it’s good enough for many small use cases. Everything works exactly as re2c except for one small difference, namely that you can define a parameter called YYDATA that points to a byte or char buffer that should be the place to read from. For an example usage, take a look at the token scanner: http://github.com/olabini/yecht/blob/master/src/main/org/yecht/TokenScanner.re.

I haven’t put any compiled binaries out anywhere, and at some point it might be nice to merge this with the proper re2c project so you can give a flag to generate Java instead of C, but for now this is all there is to the project.



Second day of JavaOne


The second day of JavaOne ended up being not as draining as the first one, although I had lots of interesting times this day too. I’ve divided it into two blog posts – this is about what happened at JavaOne, and the next one will be about the Clojure meetup.

The first session of the day was Nick Siegers talk about using JRuby in production at Kenai. An interesting talk about some of the things that worked, and some of the things that didn’t work. A surprising number of decisions were given as fiat since they needed to use Sun products for many things.

After that Neal Ford gave a comparison between JRuby and Groovy. I don’t have much to say about this talk except it seemed that some of the things seemed to be a bit more complicated to achieve in Groovy, than in Ruby.

As it turns out, the next talk was my final talk of the day. This was Bob Lee (crazy bob) talking about references and garbage collection on the JVM. A very good talk, and I learned about how the Google Collections MapMaker actually solves some of my Ioke problems. I ended up integrating it during the evening and it works great.

The second day had fewer talks for me – but I still had a very good time and even learned some stuff. Nice.



Java in the Google Cloud event in London


Me and Chris Read will talk at an event at Skills Matter in London May 11th. We will be talking about different aspects surrounding the release of Google App Engine support for Java.

You can find the registration page here: http://skillsmatter.com/podcast/ajax-ria/java-in-the-google-cloud.



Dynamic languages on Google App Engine – an overview


As mentioned in a post a few minute ago here, Google has released App Engine support for Java. This is obviously very cool – and I’ve spent a few weeks testing several things using it. It should come as no surprise that my main goal with this investigation has been to see how dynamic languages fit in with the Java story.

The good news are these: JRuby works very well on the infrastructure. I will spend some more time in another post detailing what you have to do to get a JRuby on Rails application working on Google App Engine. In this post I’ll talk a bit about the different kind of restrictions a language implementation will run into, and what needs fixing.

Several other people has been testing languages such as Groovy, Scala, Clojure and Jython. My own experiments have been focused on JRuby and Ioke. At the moment, Ioke still doesn’t run on GAE/J, but the issue is something I hope will be fixed soon.

When looking at GAE/J, it’s important to keep in mind the security restrictions that Google has been forced to implement, to make the Java implementation totally safe for them. This includes restrictions of many kinds, and some of them might come as a bit of a surprise in some cases. One of the larger things you will notice is that some classes aren’t available – and you will get a ClassNotFoundException if you try to use them from your application. Personally, I believe that using a SecurityException when trying to load these might have been better, but this fact remains: many classes you expect will not be there.

Among the classes that are there (and the important parts of the JDK are there) there are many that will give you different kinds of security related problems too. JRuby trunk has been fixed for all these issues, so it should work without modification.

File system

GAE/J restricts quite a lot of what you can do with the file system. One of the things that surprised me was that calling methods like java.io.File#canRead on a restricted file might throw a SecurityException. Basically this means that all file access in an implementation need to wrap these calls in try-catch blocks.

In JRuby, I solved this by an approach that Ryan Brown gave us – creating a subclass of java.io.File that wraps all these method and return something reasonable. canRead should for example just return false if it gets a SecurityException.

Threads

It’s very hard to secure a thread scheduler – there are ways of screwing up things that are basically impossible to guard against. That means GAE/J does not support threads at all. You can’t create new ones, you can’t create new ThreadGroups or change most settings on these threads.

This is something that is less problematic for some languages, and more problematic for others. I know that Lift (in Scala) for example had some trouble, since it relied very heavily on actors, implemented using thread pools.

Reflection

Java’s reflection capabilities are very powerful, and most of the reflection methods throw several kinds of exceptions. On GAE/J you will have to guard most reflective accesses against SecurityException too. One of the things that many dynamic languages do is to use “setAccessible” on all methods. This will fail on some methods that Google thinks you shouldn’t have access to. Several of the methods on Object are among these.

Verification

In some cases, the bytecode verifier is a bit stricter than for other JDKs. It’s important to try out many corners of the application and see that it works correctly. Of course, if your language generates code at runtime, this is even more important. The good news is that I haven’t seen any problems at all with JRuby. The bad news is that the parser for Ioke doesn’t even load (and that is static Java code). This seems like a small problem in the verifier where a stack height of 0 causes it to fail, so hopefully it will be shortly fixed.

Class loading

One of the early problems for Clojure was some intricacies in the way GAE/J handles class loaders. One of these is that doing ClassLoader.getSystemClassLoader() caused a SecurityException.

Testing

It is not immediately obvious to me how you can test applications written for GAE/J in another language. The intuition is that you would be able to use the local development server to run tests, but many of the things above don’t work exactly the same in the local dev server, and some things are problematic locally but won’t cause any trouble on the server. One thing I’ve noticed is that JRuby doesn’t load correctly, because the dev server doesn’t actually load things from jar-files in such a way that JRuby can load several property files from the same jar-file. This issue doesn’t actually exist on the real servers.

You can use unit tests to test parts of your application, but you need to make sure to stub out all the calls to the Google APIs. This is actually kinda hard in Java, since one of the negative aspects about the GAE/J APIs is that they are built around singleton factories. It is very hard to inject new functionality there.

With JRuby, you can of course override these methods and unit test without running them. The main problem with this kind of unit testing is that it won’t give you any real security on the server – since you might still run into several kinds of security exceptions.

I ended up implementing a very small unit testing framework for sanity checking. This allows you to trigger a test run by going to a specific URL. Of course, this approach sucks.

At the end of the day, it seems the best kind of testing you can do is functional testing using something like Selenium or WebDriver. Or Twist. GAE/J allow you to have different versions of an application, so one way you could utilize this automatically is to allow your automatic test run deploy to a version called “test”, and then you can use a specific url to get the latest version. Say your app is deployed on “testgae”, you can allow your CI to test against “test.latest.testgae.appspot.com”, while the production environment is still running on “testgae.appspot.com”. It’s still not perfect, but it gives you some flexibility and a possibility to run continuous integration on the correct infrastructure.



JRuby on Rails on Google App Engine


This is the third post in a series detailing information about the newly announced Google App Engine support for Java. In this post I thought I’d go through the steps you need to take to get a JRuby on Rails application working on GAE/J, and also what kind of characteristics you should expect from your application.

You need a fairly new copy of JRuby. Most of the changes needed to JRuby was added to JRuby trunk right after the JRuby 1.2 release, so check out and build something after that. The newest Rails version works fine too.

Once you have the basic Rails app set up, there are few things you need to do. First of them is to install Warble and pluginize it, and finally generate the Warble configuration file. You do that by doing “jruby -S gem install warble”, “jruby -S warble pluginize” and then “jruby -S warble config”. The last two should be done in the root of the Rails application.

You should freeze the Rails gems too. Once you have done that, you need to go through all the files there and remove anything that isn’t necessary. As it turns out, GAE/J has a hard limit on a 1000 files, and a typical Rails application will end up with much more files then that. You can remove all of ActiveRecord, all the test directories and so on.

Since you’re on GAE/J, you won’t need ActiveRecord, so you should not load it in config/environment.rb. The next step is to modify your warble.rb file. These are the things you need to do:

First, make sure that the needed GAE/J files are included, by doing:

config.includes = FileList[“appengine-web.xml”, “datastore-indexes.xml”]

You should also set the parameters for how many runtimes will be started:

config.webxml.jruby.min.runtimes = 1
config.webxml.jruby.max.runtimes = 1
config.webxml.jruby.init.serial = true

The last option is available in trunk version of JRuby-rack. If you don’t have min=1 and max=1 then you need this option set, because otherwise JRuby-rack will actually start several threads to initialize the runtimes.

Finally, to be able to use newer versions of the libraries, you need to set what Java libraries are used to the empty array:

config.java_libs = []

You will add all of the jar-files later, in the lib directory.

The last configuration option that I added is something to allow Rails to use DataStore as a session store. You can see how this is done in YARBL.

I have set several options in my appengine-web.xml file. The most important ones are to turn off JMX and to set os.arch to empty:

      <property name="jruby.management.enabled" value="false" />
      <property name="os.arch" value="" />

This is all pretty self explanatory.

One thing that I still haven’t gotten to work correctly is “protect_from_forgery”, so you need to comment this out in app/controllers/application.rb.

You need to put several jar-files in the lib-directory, and you actually need to split the jruby-complete jar, since it is too large for GAE/J in itself. The first jar-file is the appengine-api.jar file. You also need a late build of jruby-rack, and finally you need the different slices of the jruby-complete jar. I use a script like this to create several different jar-files:

#!/bin/sh

rm -rf jruby-core.jar
rm -rf ruby-stdlib.jar
rm -rf tmp_unpack
mkdir tmp_unpack
cd tmp_unpack
jar xf ../jruby-complete.jar
cd ..
mkdir jruby-core
mv tmp_unpack/org jruby-core/
mv tmp_unpack/com jruby-core/
mv tmp_unpack/jline jruby-core/
mv tmp_unpack/jay jruby-core/
mv tmp_unpack/jruby jruby-core/
cd jruby-core
jar cf ../jruby-core.jar .
cd ../tmp_unpack
jar cf ../ruby-stdlib.jar .
cd ..
rm -rf jruby-core
rm -rf tmp_unpack
rm -rf jruby-complete.jar

This creates two jar-files, jruby-core.jar and ruby-stdlib.jar.

These things should more or less put everything in order for you to be able to deploy your application to App Engine.

YARBL

As part of my evaluation of the infrastructure, I created a small application called YARBL. It allows you to have blogs, and post posts in them. No support for comments or anything fancy at all really. But it can be expanded into something real. I use both BeeU and Bumble in YARBL. BeeU allow me to make sure that only logged in users that are administrators can actually post things or change the blog. This support was extremely easy to add through the Google UserService.

You can see a (hopefully) running version at http://yarubyblog.appspot.com. You can find the source code in my GitHub repository: http://github.com/olabini/yarbl.

Bumble

Bumble is a very small wrapper around DataStore, that allow you to create data models backed by Google’s DataStore. It was developed to back YARBL, so it really only supports the things needed for that application.

This is what the data model for YARBL looks like. This should give you a feeling for how you define models with Bumble. One thing to remember is that the DataStore actually allows any properties/attributes on entitites, so it fits a language like Ruby very well.

class Person
  include Bumble

  ds :given_name, :sur_name, :email
  has_many :blogs, Blog, :owner_id
end

class Blog
  include Bumble

  ds :name, :owner_id, :created_at
  belongs_to :owner, Person
  has_many :posts, :Post, :blog_id, :iorder => :created_at
end

class Post
  include Bumble

  ds :title, :content, :created_at, :blog_id
  belongs_to :blog, Blog
end

To actually use the model for something, you can do things like these:

Blog.all

Post.all({}, :limit => 15, :iorder => :created_at)

blog = Blog.get(params[:id])
posts = blog.posts

Blog.create :name => name, :owner => @person, :created_at => Time.now

Post.all.each do |p|
  p.delete!
end

Here are most of the supported methods. The implementation is incredibly small and you really can’t go wrong with it. Of course, it is not tuned at all, so it does lots of fetches it could avoid. I’m happily accepting patches! The code can be found at http://github.com/olabini/bumble.

BeeU

When working with Google’s user service, you can use BeeU – a very small framework for helping with some things. You basically get a few different helper methods. There are three different filter methods that can be used. These are assign_user, assign_admin_status and verify_admin_user. The first two will create instance variables called @user and @admin respectively. The @user variable will contain the UserService User object, and @admin will be either true or false if the user is logged in and is an administrator or not. The last one will check that the current user is an administrator. If not logged in, it will redirect to a login page, and if logged in but not administrator, it will respond with a Not Authorized. These three methods should all be used as before filters.

There is a high level method called require_admin that you can use to point out what methods should be protected with admin access. This is really all you need.

Finally, there are two methods that generate a login-URL and a logout-URL, both of these will redirect back to where you were when the URL’s were generated.

BeeU can be found in my GitHub repository: http://github.com/olabini/beeu.

Summary

Overall, JRuby on Rails works very well on the App Engine, except for some smaller details. The major ones are the startup cost and testing. As it happens, you can’t actually get GAE/J to precreate things. Instead you’ll have to let the first release take the hit of this. Now, GAE/J does a let of preverifying of bytecodes and so on, so startup is a bit more heavy than on other JDKs. One runtime takes about 20 seconds wall time to startup, so the first hit takes some time. The good news is that this used to be worse. The last few weeks, the infrastructure has gotten a lot faster, and I’m confident this will continue to improve. It is still a problematic thing though, since you can’t precreate runtimes, which means that some request will end up taking quite a bit longer than expected.

It’s interesting to note that performance is actually pretty good once it gets running. I’ve seen between 120ms to 500ms for a request, depending on how much calls to DataStore is involved on the page – these times are not bad, considering what the infrastructure needs to do. It also seems mostly limited to the data access. If I’d had time to integrate memcaching, I could probably improve these times substantially.

The one remaining stickler for me is still testing. It’s not at all obvious how to do it, and as I noted in my earlier post there are some ways around it – but they don’t really fit in the way most Rails applications are built. In fact, I have done mostly manual testing on this application, since the cost of automating it seemed to be costly.

In all, Google App Engine with JRuby on Rails, is a really compelling combination of technology. I’m looking forward to the first ThoughtWorks project with these pieces.



Java on Google App Engine


About a year ago, Google released their first beta version of App Engine – it allowed deployment and hosting of web applications. These applications were restricted to the Python language. About 5 minutes ago, Google announced that they have released a Java version of App Engine.

I have been involved in this for a few weeks – since ThoughtWorks is a Google Enterprise Partner – and it’s been a very interesting time. This post and a few others will take a closer look at what I’ve been experimenting with.

First of all, GAE/J is not based on Dalvik, as far as I can tell. It is a full Java implementation, so you compile your applications locally, using any standard JDK and then upload them. Google recommends Java 6 for this, but Java 5 works too.

The actual interface to GAE/J uses the standard Java Servlet API, so if you have something that works with it, chances are you won’t have to do many changes to your application.

Google also gives access to several different APIs, including the User service, Memcache service, Mail service, URL fetching service, Image service and DataStore service. These all give access to different pieces of the Google machinery. For me, the most interesting parts were the User service, that makes it possible to use the regular Google authentication infrastructure, and the DataStore service that makes it a snap to use Googles data storage infrastructure. For regular Java applications, you can use either JDO or JPA to interact with the DataStore, but Google also gives access to the low level APIs too.

As part of the GAE/J release, you get access to a local development server. It tries to mimic the full environment as closely as possibly. For the specific type of application Google expects most people to write, it works very well – but if you go outside of this beaten path, many things get a bit shaky. I ended up not using it very much.

So, GAE/J is a very cool platform to target cloud applications to. Obviously Python is still a valid choice too, but the combination of apps built in Python and applications running on GAE/J seems like a very powerful choice.

ThoughtWorks has recently been spending much time in this area and we have gotten some good experience with it. We look forward to be able to work with applications for Google App Engine, written in Java, or any of the other languages supported. (If you follow todays blog posts, you will see that I’m not the only ThoughtWorker who has explored alternative languages on this platform).

My esteemed colleagues have also written up their experiences with the Java pieces of Google App Engine. You can read it here: http://paulhammant.com/blog/google-app-engine-for-java-with-rich-ruby-clients.html, http://elhumidor.blogspot.com/ and http://blog.sriramnarayan.com/.