Can bytecodes perform well?


I really need to write an answer to a comment that appeared on my post about Rubinius by Kragen Sitaker. It has two points in it, and I really want to address one of them at some length. So here goes. The comment read like this:

“It is byte code based. This means it’s easier to handle performance.”

If by “handle” you mean “not have”.

“garbage collection/object memory can be switched out to use another algorithm”

In your dreams. I suspect you’ll find out that the architecture isn’t as clean as you think it is when you try to plug in some other existing garbage collector or object memory.

It sounds like a good project, but I don’t think its advantages are going to be so enormous as you think they are.

Woha. That’s hard to argue with. Isn’t it? Or maybe not. Let’s being with the second point. Is Rubinius architecture clean and decoupled enough to allow swapping of GC and object memory? My answer is a resounding yes. But of course you can go and look for yourself. Of course, it’s possible to make a system where you can decouple the GC. Sun’s standard JVM does this, for example. But the interesting point isn’t whether Rubinius’ hooks for this is clean enough, but if they are cleaner than MRI’s. If you have ever tried to switch out the GC in MRI, you know that Rubinius beats MRI hands down in this regard. If not, you can ask Laurent Sansonetti at Apple, who have actually done most of the work to switch out the GC in MRI if that was a fun experience.

Let’s see, what’s next? Oh yeah. Bytecodes are always slow. No argument here. C++ will always beat a bytecode based engine. And C will almost always beat C++. And assembler (in the right hands) will usually beat C. But wait… Isn’t Java bytecode based? And doesn’t Java 6 perform on the same level as C in many cases, and in some cases performing better than C? Well, yes, it does… And wasn’t Smalltalk always based on bytecodes? Most Smalltalk engines performed very well. Why is MRI switching to bytecodes for 1.9? And why has Python always been bytecode based? And why is the CLR bytecode based? Why was even Pascal using bytecodes back in the day? (OK, that is cheating… Pascal used bytecodes for portability, not performance, but it still worked well in that aspect too). Erlang is bytecode based.

Basically, there are some cases where a static language will benefit from straight compilation down to hardware machine codes. OCaML is a typical example of such a language. Due to the extremely stringent type requirements of the language, the emitted code is usually faster than C. But that is the exception, and only works for bondage-tightly typed languages. When talking dynamic languages, bytecodes is the real path to good performance. Granted, a naive implementation of a bytecode engine will not perform well. But that is true for a compiler too. The difference is that the part interpreting bytecodes is usually a quite small part of the whole runtime system, and it can be switched out for better performance, piecemal or all together.

There are other reasons. For example, statically typed languages like Java and the CLR family of languages use bytecodes because it gives the runtime system the opportunity to dynamically change the machine code running based on statistics and criteria found out during runtime. This means that your application will actually have better performance in the parts where it counts, and the parts that are not heavily used will not be optimized. (That’s what HotSpot does, for example). This is not possible in a clean compilation to machine code. Java would never have had the performance it has now if it weren’t for the bytecodes.

So please, stop spreading this myth. It is NOT true and it has NEVER been true.