Ola Bini: Programming Language Synchronicity

July 28th, 2010

Life in the time of Java 7

I’m currently in the process of implementing Seph, and I’ve reached an inflection point. This point is the last responsible moment to choose what I will target with my language. Seph will definitely be a JVM language, but after that there is a range of options – some quite unlikely, some more likely. The valid choices are:

Target Java 1.4
Target Java 5/6
Target Java 7
Target Java 7 with extensions

Of these, the first options isn’t really interesting for Seph, so I’ll strike it out right now. The other three choices are however still definitely possible – and good choices. I thought I might talk a little bit about why I would choose each one of them. I haven’t made a final decision yet, so that will have to be the caveat for this post.

Before talking about the different choices, I wanted to mention a few things about Seph that matters to this decision. The first one is that I want Seph to be useful in the real world. That means it should be reasonably fast, and runnable for people without too much friction. I want the implementation to be small and clean, and hopefully as DRY as possible – if I end up with both and interpreter and just-in-time compiler, I want to be able to share as much of these implementations as possible.

Java 5/6

The easiest way to go forward would be to only use Java 5 or 6. This would mean no extranice features, but it would also mean the barrier to entry would be very low. It would mean development on Seph would be much easier and wouldd in general make everything simpler for everyone. The problem with it would mainly be implementation complexity and speed, which would both suffer compared to any of the Java 7 variants.

Java 7

There are many good reasons to go with Java 7, but there are also some horrible consequences of doing this. For Seph, the things that would make things from Java 7 is method handles, invoke dynamic and defender methods. Other things would be nice, but the three previous ones are the killer features for Seph. Method handles make it possible to write much more succinct code, not generate lots of extra classes for each built in method, and many other things. It also becomes possible to refer to compiled code using method handles, so the connection between the JIT and the interpreter would be much nicer to represent.

Invoke dynamic is quite obvious – it would allow me to do much nicer compilation to bytecode, and much faster. However, I could still build the same thing myself, to much greater cost and it would also mean inlining wouldn’t be as easy to get.

Finally, defender methods is a feature of the new lambda proposal that allow you to add new methods to interfaces without breaking backwards compatibility. The way this works is that when you add a new method to an interface, you can specify a static method that should be called when that interface method is invoked and there are no other implementations on the concrete classes for a specific object. But the interesting side effect of this feature is that you can also use it to specify default implementations for the core language methods without depending on a shared base class. This will make the implementation much smaller and more flexible, and might also be useful to specify required and optional methods in an API.

The main problem with Java 7 is that it doesn’t exist yet, and the time schedule is uncertain. It is not entirely certain exactly what the design of the things will look like either – so it’s definitely a moving target. Finally, it will make it very hard for people to help out on the project, and also it won’t make Seph a possible language for people to use until they upgrade to Java 7.

Java 7 with extensions

It turns out that the interesting features coming in Java 7 is just the tip of the iceberg. There are many other proposed features, with partial implementations in the DaVinci project (MLVM). These features aren’t actually complete, but one way of forcing them to become more complete is to actually use them for something real and give lots of feedback on the feature. Some of the more interesting features:

Interface injection

This feature will allow you to say after the fact that a specific class implements an interface, and also specify implementations for the methods on that interface. This is very powerful and would be extremely helpful in certain parts of the language implementation – especially when doing integration with Java. The patch is currently not very complete, though.

Tail calls

Allowing the JVM to perform proper tail calls would make it much easier to implement many recursive functional algorithms easily. Since Seph will have proper tail calls in the language, this will mean that I will have to implement this myself if the JVM doesn’t do it, which means Seph will be slower based on this. The patch seems to be quite good and possible to merge and harden to the JDK at some point. Of all the things on this list, this seems to be one of things that we can actually envision see being added in the Java 7 or Java 8 time frame.

Coroutines/continuations

Both coroutines and continuations seem to be possible to do in a good way, at least partially. Coroutines might be interesting for Seph as an alternative to Kilim, but right now it seems to be a bit unstable. Continuations would allow me to expose continuations as a first class citizen which is never bad – but it wouldn’t give me much more than that.

Hotswapping

Hotswapping of code would make it possible to do agressive JITting and then backing out from that when guards fail and so on. This is less interesting when we have invoke dynamic, but will give some more flexibility in terms of code generation.

Fixnums, tuples, value types

We all want ways of making numbers faster – but these features might also make it possible to efficiently represent simple composite data structures, and also things like multiple return values. These are fairly simple features, but have no real patch right now (I think).

Light weight code loading (anonymous classes)

It is horrible to load byte code at runtime in Java at this point. The reason is that to be able to make sure your loaded code gets garbage collected, you will have to load each chunk of code in a new class in a new classloader. This becomes very expensive very fast, and also endangers permgen. Anonymous classes make this go away, since they don’t have names. This means you don’t actually have to keep a reference to older classes, since there is no way to get to them again if you lost the reference to them. This is a good thing, and makes it possible to not generate class loaders every time you load new code. THe state of this seems to be quite stable, but at this point JVM dependent.

The price

Of course, all of these lovely features comes with a price. Two prices in fact. The first price is that all the above features are incomplete, ranging from working patches to proof of concepts or sketches of ideas. That means that the ground will change under any language using it – which introduces hard version dependencies and complicates building. The other price is that none of these features are part of anything that has been released, and there are no guarantees that it will ever be merged in Java at any point. So the only viable way of distributing Seph would be to distribute standard build files with a patched OpenJDK so that anyone can download and use that specific JDK. But that limits interoperability and causes lots of other problems.

Somewhere in between

My current thinking is that all of the above choices are bad. For Seph I want something inbetween, and my current best approach looks like this. You will need a new build of MLVM with invoke dynamic and method handles to develop and compile Seph. I will utilize invoke dynamic and method handles in the implementation, and allow people to use Rémi Forax’ JSR 292 backport to run it on Java 5 and 6. When Java 7 finally arrives, Seph will be more or less ready for it – and Seph can get some of the performance and maintainability benefits of using JSR 292 immediately. At this point I can’t actually use defender methods, but if anyone is clever enough to figure out a backport that will allow defender methods to work on Java 5 or 6, I would definitely use them all over the place.

This doesn’t actually preclude the possibility of creating alternative research versions of Seph that uses some of the other MLVM patches. Charles Nutter have shown how much you can do by using flags to add features that are turned off by default. So Seph could definitely grow the above features, but currently I won’t make the core of the language depend on them.

8 Comments | By Ola Bini | In: blogging, seph | tags: davinci, invoke dynamic, java, java 7, jsr292, method handles, mlvm, programming language design, seph. | #

July 27th, 2010

Questioning the reality of generics

I’ve been meaning to write about this for a while, since I keep saying this and people keep getting surprised. Now maybe I’m totally wrong here, and if that’s the case it would be nice to hear some good arguments for that. Here’s my current point of view on the subject anyway.

A specter is haunting the Java community – the specter of generics.

Java introcued a feature called generics in Java 5 (this feature is generally known under the name of parametric polymorphism in the literate). Before Java 5 it wasn’t possible to create a reusable collection that would ensure the type safety at compile time of what you put in to that collection. You could create a collection of for example Strings and have that working correctly, but if you wanted to have a collection of anything, as long as that anything was the same type, you were restricted to doing runtime checks, or just having good tests.

Java 5 made it possible to add type parameters to any other type, which means you could create more specific collections. There are still problems with these – they interact badly with native arrays for example, and wildcards (Java’s way of implementing co= and contravariance) have ended up being very hard for Java developers to use correctly.

Java and C# both added generic types at roughly the same time. The C# version of generics differed in a few crucial ways, though. The most important difference in implementation is that C# generics are reified, while Java generics use type erasure. And this is really the gist of this blog post. Because over and over I hear people lament the lack of reified generics in Java, citing how good C# and the CLR is to have this feature. But is that really the case? Is reified generics a good thing? Of course, that always depends on who is asking the question. Reified might well be good for one person but not another. Here you will hear my view.

Reified? Huh?

So what does reified generics mean, anyway? It is probably easiest to explain compared to the Java implementation that uses type erasure. Slightly simplified: in Java generics doesn’t exist at runtime. It is purely a fiction that the compiler uses to handle type checking and make sure you don’t do anything bad with your collection. After the generics have been type checked, they are used to generate casts and type checks in the code using generics, some metadata is inserted into the class file format, and then the generic information is thrown away.

In contrast, on the CLR, generic classes exist as specific versions of their class. The same class with different generic type arguments are really different classes. There are no casts happening at the implementation level, and the CLR will as a result generate more specific code for the generic code. Reflection and dynamic type checks is also possible on the CLR. Having reified generics means basically that they exist at runtime, that the virtual machine knows about them and handles them correctly.

Multi-language virtual machines

The last twenty years something interesting has happened. Both out hardware and software has gotten mature enough that a new generation of virtual machines have entered the market. Traditionally, virtual machines for languages were made for specific languages, such as Pascal, Lisp and Smalltalk, and possibly except for SECD and the Warren machine, there haven’t really been any virtual machines optimized to running more than one language well. The JVM didn’t start that way either, but it turned out to be more well suited for it than expected, and there are lots of efforts to make it an even better platform. The CLR, Parrot, LLVM and Rubinius are other examples of things that seem to become environments rather than just implementation strategies for languages.

This is very exciting, and I think it’s a really good thing. We are solving very complex problems where the component problems are best solved in different ways. It seems like a weird assumption that one programming language is the best way of solving all problems. But there is also a cost associated with using more than one language. So having virtual machines act as platforms, where a sharked chunk of libraries are available, and the cost of implementation is low, makes a lot of sense.

In summary, I feel that the JVM was the first step towards a real viable multi-language virtual machine, and we are currently in the middle of the evolution towards that point.

Solving the problems

So why not add reified generics to the JVM at this point? It could definitely be done, and using an approach similar to the CLR, where libraries are divided into pre and post reified makes the path quite simple from an implementation standpoint. On the user side, there would be a new proliferation of libraries to learn – but maybe that’s a good thing. There is a lot of cruft in the Java standard libraries that could be cleaned up. There are some sticky details, like how to handle the API’s that were designed for erased generics, but those problems could definitely be solved. It would also solve some other problems, such as making it possible for Scala to pattern match on type parameters and solving part of the problem with abstracting over primitive types. And it’s absolutely possible to do. It would probably make the Java language into a better language.

But is it the only solution? At this point, making this kind of change would complicate the API’s to a large degree. The reflection libraries would have to be completely redesigned (but still kept around for backwards compatibility). The most probable result would be a parallel hierarchy of classes and interfaces, just like in the CLR.

Refified generics are generally being proposed in discussions about three different things. First, performance, second, making it easier for some features in Scala and other statically typed languages on the JVM, and thirdly to handle primitives and primitive arrays a bit better. Of these, the first one is the least common, and the least interesting by far. JVM performance is already nothing short of amazing. The second point I’ll come back to in the last section. The third point is the most interesting, since there are other solutions here, including unify primitives with objects inside the JVM, by creating value types. This would solve many other problems for language implementors on the JVM, and enable lots of interesting features.

The short stick

I believe in a multi language future, and I believe that the JVM will be a core part of that future. Interoperability is just too expensive over OS boundaries – you want to be on the same platform if possible. But for the JVM to be a good environment for more than one language, it’s really important that decisions are made with that in mind. The last few years of fantastic progress from languages like Rhino, Jython, JRuby, Groovy, Scala, Fantom and Clojure have shown that it’s not only possible, but benificial for everyone involved to focus on JVM languages. JSR 223, 292 and several others also means the JVM is more and more being viewed as a platform. This is good.

Generics is a complicated language feature. It becomes even more complicated when added to an existing language that already has subtyping. These two features don’t play very well together in the general case, and great care has to be taken when adding them to a language. Adding them to a virtual machine is simple if that machine only has to serve one language – and that language uses the same generics. But generics isn’t done. It isn’t completely understood how to handle correctly and new breakthroughs are happening (Scala is a good example of this). At this point, generics can’t be considered “done right”. There isn’t only one type of generics – they vary in implementation strategies, feature and corner cases.

What this all means is that if you want to add reified generics to the JVM, you should be very certain that that implementation can encompass both all static languages that want to do innovation in their own version of generics, and all dynamic languages that want to create a good implementation and a nice interfacing facility with Java libraries. Because if you add reified generics that doesn’t fulfill these criteria, you will stifle innovation and make it that much harder to use the JVM as a multi language VM.

I’m increasingly coming to the conclusion that multi language VM’s benefit from being as dynamic as possible. Runtime properties can be extracted to get performance, while static properties can be used to prove interesting things about the static pieces of the language.

Just let generics be a compile time feature. If you don’t there are two alternatives – you are an egoist that only care about the needs of your own language, or you think you have a generic type system that can express all other generic type systems. I know which one I think is more likely.

17 Comments | By Ola Bini | In: blogging | tags: generics, java, jvm, mlvm, multi language vm, parametric polymorphism, programming language design, programming languages, reified generics. | #