Bottom Types in Dynamic Languages

On the Friday of QCon, Sir Tony Hoare held a presentation about the null reference. This was interesting stuff, and it tied into something I’ve been thinking about for a while. Namely, the similarities and differences in the role of null/nil in static and dynamic languages.

At the surface, a null reference in a statically typed language seems to be a value of a bottom type. Since null is still a value, that cannot be true – since according to the formal definition, a bottom type can have no real value.

To take a typical example of where a bottom type is needed, take a look at what happens in Scala when you have a method that always throws an exception. Of course, that method can never return – so what is the type of the return value? That type is Nothing, the bottom type in Scala. There are no values of type Nothing in Scala, and can never be. The Nothing type is really useful in other circumstances too. Lists in Scala correspond to functional linked lists, where each cons cell is immutable. Every cons cell has a type – this type is the most generic type of the value the cons cell points to, and the type of cons cell it points to. Recursively, this means that the full list will have a generic type that is the least generic type that can encompass every element of the list. But how does this really work for the end element? Every list needs to have a final element that stops the list. This end element generally doesn’t contain a value. So what is the type of that end element? List[Nothing]. The reason being that Nothing can be viewed as the subtype of every other type in the system, which means it is always more specific than any other type. But when you create a new cons cell that points to this end value, the full list will always get the type of the new cons cell.

That explains bottom types, but we are no closer to understanding null references. In a statically typed language, a null reference acts like a value of a bottom type. The reason is that you can assign null to any other type – and this is generally only true for types that are subtypes of the place where it is being assigned to. What is interesting about null in C-like languages, is that there is no type that actually corresponds to null. In languages with null references like this, it seems that the domain of every other type in the system is extended to include null as a value. I haven’t found any reference to how this works from a type checking perspective, but it does look like the bottom type – except that there is no type associated to it. So from now on I will just call null a bottom value. One of the interesting aspects of including a bottom value in your language is that it opens up for runtime errors. If you don’t check every place that uses a reference for the bottom value before using it, you have the potential to get a runtime error.

The reason I started thinking about this was actually to contrast the way null works in a statically typed language, and how it works in a dynamically typed language. But the points above make it more and more obvious that null is actually a runtime feature even in static languages. The type system bypass nulls to allow more dynamism at runtime. But take a look at a dynamically typed language. Of course you won’t have any bottom types, since you don’t have a type system that need to check types. Another feature of most dynamically typed languages is that everything is an expression. In these languages, nil is usually used as a sentinel value to mark that the operation didn’t actually result in any real value. The semantics of how nil is interpreted can depend on both the language and the specific API of the operation in question.

The crucial difference in how nil is handled in a dynamically typed language, compared with how a null reference in a statically typed language works, is one of high level approach. In particular, in a dynamically typed language you will not know what kind of value you get back from an operation. In Ruby you might get back something that is of an expected class, but you can also get something that just looks like that class. Or something totally different, like a recorder object for example. All of these things make nils in dynamically typed languages much less of a problem, since the way you develop in these languages tend to be quite different. In contrast, a null reference in Java is a loop hole, that basically means that anything can return either null or something of the type specified. It’s not exactly a lie, but it is an implicit effect that generally comes as a surprise in a statically typed language.

There is a proposal to add a @NotNull annotation to Java, so you can mark references that shouldn’t allow null. This is not good enough, and will probably not help much at this stage. The reason is that you will be penalized for avoiding null – not the other way around. It also makes code that tries to be correct _more_ verbose, instead of the other way around. Because of backwards compatibility, there is really nothing to do about this though. But I recommend language designers to think carefully before adding a Java-style reference.

Some people asks what the alternative is. What stronger type systems generally allow you to do is have an Option (or Maybe) type, that is genericized on the expected output value.

Say you define a method in Java called get() that will either return a String or a null. If you instead had an Option class in Java, you could define the method as “Option<String> get()”. What is neat about Option is that it’s abstract and it has got two concrete subtypes. The first one is Some, the second is None. This makes it explicit when you can expect a return value to be not be there, and at the same time forces the user of the method to actively handle this case. The default case is references that will always be valid. This a good way to handle this problem in a type system.