Java in the Google Cloud event in London


Me and Chris Read will talk at an event at Skills Matter in London May 11th. We will be talking about different aspects surrounding the release of Google App Engine support for Java.

You can find the registration page here: http://skillsmatter.com/podcast/ajax-ria/java-in-the-google-cloud.



Dynamic languages on Google App Engine – an overview


As mentioned in a post a few minute ago here, Google has released App Engine support for Java. This is obviously very cool – and I’ve spent a few weeks testing several things using it. It should come as no surprise that my main goal with this investigation has been to see how dynamic languages fit in with the Java story.

The good news are these: JRuby works very well on the infrastructure. I will spend some more time in another post detailing what you have to do to get a JRuby on Rails application working on Google App Engine. In this post I’ll talk a bit about the different kind of restrictions a language implementation will run into, and what needs fixing.

Several other people has been testing languages such as Groovy, Scala, Clojure and Jython. My own experiments have been focused on JRuby and Ioke. At the moment, Ioke still doesn’t run on GAE/J, but the issue is something I hope will be fixed soon.

When looking at GAE/J, it’s important to keep in mind the security restrictions that Google has been forced to implement, to make the Java implementation totally safe for them. This includes restrictions of many kinds, and some of them might come as a bit of a surprise in some cases. One of the larger things you will notice is that some classes aren’t available – and you will get a ClassNotFoundException if you try to use them from your application. Personally, I believe that using a SecurityException when trying to load these might have been better, but this fact remains: many classes you expect will not be there.

Among the classes that are there (and the important parts of the JDK are there) there are many that will give you different kinds of security related problems too. JRuby trunk has been fixed for all these issues, so it should work without modification.

File system

GAE/J restricts quite a lot of what you can do with the file system. One of the things that surprised me was that calling methods like java.io.File#canRead on a restricted file might throw a SecurityException. Basically this means that all file access in an implementation need to wrap these calls in try-catch blocks.

In JRuby, I solved this by an approach that Ryan Brown gave us – creating a subclass of java.io.File that wraps all these method and return something reasonable. canRead should for example just return false if it gets a SecurityException.

Threads

It’s very hard to secure a thread scheduler – there are ways of screwing up things that are basically impossible to guard against. That means GAE/J does not support threads at all. You can’t create new ones, you can’t create new ThreadGroups or change most settings on these threads.

This is something that is less problematic for some languages, and more problematic for others. I know that Lift (in Scala) for example had some trouble, since it relied very heavily on actors, implemented using thread pools.

Reflection

Java’s reflection capabilities are very powerful, and most of the reflection methods throw several kinds of exceptions. On GAE/J you will have to guard most reflective accesses against SecurityException too. One of the things that many dynamic languages do is to use “setAccessible” on all methods. This will fail on some methods that Google thinks you shouldn’t have access to. Several of the methods on Object are among these.

Verification

In some cases, the bytecode verifier is a bit stricter than for other JDKs. It’s important to try out many corners of the application and see that it works correctly. Of course, if your language generates code at runtime, this is even more important. The good news is that I haven’t seen any problems at all with JRuby. The bad news is that the parser for Ioke doesn’t even load (and that is static Java code). This seems like a small problem in the verifier where a stack height of 0 causes it to fail, so hopefully it will be shortly fixed.

Class loading

One of the early problems for Clojure was some intricacies in the way GAE/J handles class loaders. One of these is that doing ClassLoader.getSystemClassLoader() caused a SecurityException.

Testing

It is not immediately obvious to me how you can test applications written for GAE/J in another language. The intuition is that you would be able to use the local development server to run tests, but many of the things above don’t work exactly the same in the local dev server, and some things are problematic locally but won’t cause any trouble on the server. One thing I’ve noticed is that JRuby doesn’t load correctly, because the dev server doesn’t actually load things from jar-files in such a way that JRuby can load several property files from the same jar-file. This issue doesn’t actually exist on the real servers.

You can use unit tests to test parts of your application, but you need to make sure to stub out all the calls to the Google APIs. This is actually kinda hard in Java, since one of the negative aspects about the GAE/J APIs is that they are built around singleton factories. It is very hard to inject new functionality there.

With JRuby, you can of course override these methods and unit test without running them. The main problem with this kind of unit testing is that it won’t give you any real security on the server – since you might still run into several kinds of security exceptions.

I ended up implementing a very small unit testing framework for sanity checking. This allows you to trigger a test run by going to a specific URL. Of course, this approach sucks.

At the end of the day, it seems the best kind of testing you can do is functional testing using something like Selenium or WebDriver. Or Twist. GAE/J allow you to have different versions of an application, so one way you could utilize this automatically is to allow your automatic test run deploy to a version called “test”, and then you can use a specific url to get the latest version. Say your app is deployed on “testgae”, you can allow your CI to test against “test.latest.testgae.appspot.com”, while the production environment is still running on “testgae.appspot.com”. It’s still not perfect, but it gives you some flexibility and a possibility to run continuous integration on the correct infrastructure.



JRuby on Rails on Google App Engine


This is the third post in a series detailing information about the newly announced Google App Engine support for Java. In this post I thought I’d go through the steps you need to take to get a JRuby on Rails application working on GAE/J, and also what kind of characteristics you should expect from your application.

You need a fairly new copy of JRuby. Most of the changes needed to JRuby was added to JRuby trunk right after the JRuby 1.2 release, so check out and build something after that. The newest Rails version works fine too.

Once you have the basic Rails app set up, there are few things you need to do. First of them is to install Warble and pluginize it, and finally generate the Warble configuration file. You do that by doing “jruby -S gem install warble”, “jruby -S warble pluginize” and then “jruby -S warble config”. The last two should be done in the root of the Rails application.

You should freeze the Rails gems too. Once you have done that, you need to go through all the files there and remove anything that isn’t necessary. As it turns out, GAE/J has a hard limit on a 1000 files, and a typical Rails application will end up with much more files then that. You can remove all of ActiveRecord, all the test directories and so on.

Since you’re on GAE/J, you won’t need ActiveRecord, so you should not load it in config/environment.rb. The next step is to modify your warble.rb file. These are the things you need to do:

First, make sure that the needed GAE/J files are included, by doing:

config.includes = FileList[“appengine-web.xml”, “datastore-indexes.xml”]

You should also set the parameters for how many runtimes will be started:

config.webxml.jruby.min.runtimes = 1
config.webxml.jruby.max.runtimes = 1
config.webxml.jruby.init.serial = true

The last option is available in trunk version of JRuby-rack. If you don’t have min=1 and max=1 then you need this option set, because otherwise JRuby-rack will actually start several threads to initialize the runtimes.

Finally, to be able to use newer versions of the libraries, you need to set what Java libraries are used to the empty array:

config.java_libs = []

You will add all of the jar-files later, in the lib directory.

The last configuration option that I added is something to allow Rails to use DataStore as a session store. You can see how this is done in YARBL.

I have set several options in my appengine-web.xml file. The most important ones are to turn off JMX and to set os.arch to empty:

      <property name="jruby.management.enabled" value="false" />
      <property name="os.arch" value="" />

This is all pretty self explanatory.

One thing that I still haven’t gotten to work correctly is “protect_from_forgery”, so you need to comment this out in app/controllers/application.rb.

You need to put several jar-files in the lib-directory, and you actually need to split the jruby-complete jar, since it is too large for GAE/J in itself. The first jar-file is the appengine-api.jar file. You also need a late build of jruby-rack, and finally you need the different slices of the jruby-complete jar. I use a script like this to create several different jar-files:

#!/bin/sh

rm -rf jruby-core.jar
rm -rf ruby-stdlib.jar
rm -rf tmp_unpack
mkdir tmp_unpack
cd tmp_unpack
jar xf ../jruby-complete.jar
cd ..
mkdir jruby-core
mv tmp_unpack/org jruby-core/
mv tmp_unpack/com jruby-core/
mv tmp_unpack/jline jruby-core/
mv tmp_unpack/jay jruby-core/
mv tmp_unpack/jruby jruby-core/
cd jruby-core
jar cf ../jruby-core.jar .
cd ../tmp_unpack
jar cf ../ruby-stdlib.jar .
cd ..
rm -rf jruby-core
rm -rf tmp_unpack
rm -rf jruby-complete.jar

This creates two jar-files, jruby-core.jar and ruby-stdlib.jar.

These things should more or less put everything in order for you to be able to deploy your application to App Engine.

YARBL

As part of my evaluation of the infrastructure, I created a small application called YARBL. It allows you to have blogs, and post posts in them. No support for comments or anything fancy at all really. But it can be expanded into something real. I use both BeeU and Bumble in YARBL. BeeU allow me to make sure that only logged in users that are administrators can actually post things or change the blog. This support was extremely easy to add through the Google UserService.

You can see a (hopefully) running version at http://yarubyblog.appspot.com. You can find the source code in my GitHub repository: http://github.com/olabini/yarbl.

Bumble

Bumble is a very small wrapper around DataStore, that allow you to create data models backed by Google’s DataStore. It was developed to back YARBL, so it really only supports the things needed for that application.

This is what the data model for YARBL looks like. This should give you a feeling for how you define models with Bumble. One thing to remember is that the DataStore actually allows any properties/attributes on entitites, so it fits a language like Ruby very well.

class Person
  include Bumble

  ds :given_name, :sur_name, :email
  has_many :blogs, Blog, :owner_id
end

class Blog
  include Bumble

  ds :name, :owner_id, :created_at
  belongs_to :owner, Person
  has_many :posts, :Post, :blog_id, :iorder => :created_at
end

class Post
  include Bumble

  ds :title, :content, :created_at, :blog_id
  belongs_to :blog, Blog
end

To actually use the model for something, you can do things like these:

Blog.all

Post.all({}, :limit => 15, :iorder => :created_at)

blog = Blog.get(params[:id])
posts = blog.posts

Blog.create :name => name, :owner => @person, :created_at => Time.now

Post.all.each do |p|
  p.delete!
end

Here are most of the supported methods. The implementation is incredibly small and you really can’t go wrong with it. Of course, it is not tuned at all, so it does lots of fetches it could avoid. I’m happily accepting patches! The code can be found at http://github.com/olabini/bumble.

BeeU

When working with Google’s user service, you can use BeeU – a very small framework for helping with some things. You basically get a few different helper methods. There are three different filter methods that can be used. These are assign_user, assign_admin_status and verify_admin_user. The first two will create instance variables called @user and @admin respectively. The @user variable will contain the UserService User object, and @admin will be either true or false if the user is logged in and is an administrator or not. The last one will check that the current user is an administrator. If not logged in, it will redirect to a login page, and if logged in but not administrator, it will respond with a Not Authorized. These three methods should all be used as before filters.

There is a high level method called require_admin that you can use to point out what methods should be protected with admin access. This is really all you need.

Finally, there are two methods that generate a login-URL and a logout-URL, both of these will redirect back to where you were when the URL’s were generated.

BeeU can be found in my GitHub repository: http://github.com/olabini/beeu.

Summary

Overall, JRuby on Rails works very well on the App Engine, except for some smaller details. The major ones are the startup cost and testing. As it happens, you can’t actually get GAE/J to precreate things. Instead you’ll have to let the first release take the hit of this. Now, GAE/J does a let of preverifying of bytecodes and so on, so startup is a bit more heavy than on other JDKs. One runtime takes about 20 seconds wall time to startup, so the first hit takes some time. The good news is that this used to be worse. The last few weeks, the infrastructure has gotten a lot faster, and I’m confident this will continue to improve. It is still a problematic thing though, since you can’t precreate runtimes, which means that some request will end up taking quite a bit longer than expected.

It’s interesting to note that performance is actually pretty good once it gets running. I’ve seen between 120ms to 500ms for a request, depending on how much calls to DataStore is involved on the page – these times are not bad, considering what the infrastructure needs to do. It also seems mostly limited to the data access. If I’d had time to integrate memcaching, I could probably improve these times substantially.

The one remaining stickler for me is still testing. It’s not at all obvious how to do it, and as I noted in my earlier post there are some ways around it – but they don’t really fit in the way most Rails applications are built. In fact, I have done mostly manual testing on this application, since the cost of automating it seemed to be costly.

In all, Google App Engine with JRuby on Rails, is a really compelling combination of technology. I’m looking forward to the first ThoughtWorks project with these pieces.



Java on Google App Engine


About a year ago, Google released their first beta version of App Engine – it allowed deployment and hosting of web applications. These applications were restricted to the Python language. About 5 minutes ago, Google announced that they have released a Java version of App Engine.

I have been involved in this for a few weeks – since ThoughtWorks is a Google Enterprise Partner – and it’s been a very interesting time. This post and a few others will take a closer look at what I’ve been experimenting with.

First of all, GAE/J is not based on Dalvik, as far as I can tell. It is a full Java implementation, so you compile your applications locally, using any standard JDK and then upload them. Google recommends Java 6 for this, but Java 5 works too.

The actual interface to GAE/J uses the standard Java Servlet API, so if you have something that works with it, chances are you won’t have to do many changes to your application.

Google also gives access to several different APIs, including the User service, Memcache service, Mail service, URL fetching service, Image service and DataStore service. These all give access to different pieces of the Google machinery. For me, the most interesting parts were the User service, that makes it possible to use the regular Google authentication infrastructure, and the DataStore service that makes it a snap to use Googles data storage infrastructure. For regular Java applications, you can use either JDO or JPA to interact with the DataStore, but Google also gives access to the low level APIs too.

As part of the GAE/J release, you get access to a local development server. It tries to mimic the full environment as closely as possibly. For the specific type of application Google expects most people to write, it works very well – but if you go outside of this beaten path, many things get a bit shaky. I ended up not using it very much.

So, GAE/J is a very cool platform to target cloud applications to. Obviously Python is still a valid choice too, but the combination of apps built in Python and applications running on GAE/J seems like a very powerful choice.

ThoughtWorks has recently been spending much time in this area and we have gotten some good experience with it. We look forward to be able to work with applications for Google App Engine, written in Java, or any of the other languages supported. (If you follow todays blog posts, you will see that I’m not the only ThoughtWorker who has explored alternative languages on this platform).

My esteemed colleagues have also written up their experiences with the Java pieces of Google App Engine. You can read it here: http://paulhammant.com/blog/google-app-engine-for-java-with-rich-ruby-clients.html, http://elhumidor.blogspot.com/ and http://blog.sriramnarayan.com/.