Would Type Inference help Java


My former colleague Lars Westergren recently posted a blog (here) about type inferencing, posing the question whether type inference would actually be good for Java, and if it would provide any benefits outside of just “less typing”.

In short: no. Type inferencing would probably not do much more than save you some typing. But how much typing it would save you could definitely vary depending on the type of type inference you added. The one version I would probably prefer is just a very simple hack to avoid writing out the generic type arguments. One simple way of doing that would be to allow an equals sign inside of the angle brackets. In that case you could do this:

List<=>      l  = new ArrayList<String>();
List<String> l2 = new ArrayList<=>();

Of course, you can do it on more complicated expressions:

List<Set<Map<Class<?>, List<String>>>> l = new ArrayList<=>();

This would save us some real pain in the definition of genericized types, and it wouldn’t strip away much stuff you need for readability. In the above examples it would just strip away one duplication, and you don’t need that duplication to read it correctly. The one case where it might be a little bit harder to read would be if you defined a variable and assigned it somewhere else. In that case the definition would need to carry the type information, so the instantiation would use the <=> syntax. I think that would be an acceptable price to reduce the verbosity of Java generics.

Another kind of generics that would be somewhat useful is the kind added to C#, which is only local to a scope. That means there will be no type inferencing of member variables, method parameters or return values. Of course, that’s the crux of Lars question, since this kind of type inference potentially removes ALL type information in the current text, since you can do:

var x = someValue.DoSomething();

At this point there is no easy way for you to know what the type of x actually is. Reading it like this, it looks a bit frightening if you’re used to Java type tags, but in fact this is not what you would see. In most cases you have a small method – maybe 5-15 lines of code, where x is being used in some way or another. In many cases you will see methods called on x, or x used as argument to method calls. Both of these usages gives you clues about what it might be, but in fact you don’t always need to know what type it is. You just need to know what you can do with it. And that’s exactly what Java interfaces represent. So for example, do you know what class you get back from Collections.synchronizedMap()? No, and you shouldn’t need to know. What you do know is that it’s something that implements Map, and the documentation says that it is synchronized, but that is it. The only thing you know about it is that you can use it as a map.

So in practice, the kind of type inference C# adds is actually quite useful, clean, and doesn’t cause too much trouble – especially if you have one of those fancy ideas that do method completion… =)

From another angle, there are some things that type inference could possible do, but that you will never see in Java. For example, say that you assign a variable to something, and later you assign that variable to some other value. If these two values are distinct types that doesn’t overlap in the inheritence chain, you will usually get an error. But if you have an advanced type system, it will do unification for you. The basic versions will just find the most common supertype (the disjunction), but you can also imagine the compiler injecting a new type into your program that is the union of the two types in use. This will provide something similar to duck typing while still retaining some static type safety. If your type system allows multiple inheritence, the synthetic union type might even be a subclass of both the types in question.

So yeah. The long answer is that you can actually do some funky stuff with type inference that doesn’t immediately translate to less typing. Although less typing and better abstractions is what programming languages are all about, right? Otherwise assembler provides everything we want.



An introduction to categories of type systems


Since the current world is moving away from languages in the classical imperative paradigm, it’s more and more important to understand the fundamental type differences between programming languages. I’ve seen over and over that this is still something people are confused by. This post won’t give you all you need – for that I recommend Programming Language Pragmatics by Michael L. Scott, a very good book.

Right now, I just wanted to minimize the confusion that abounds surrounding two ways of categorizing programming languages. Namely strong versus weak typing and dynamic versus static typing.

The first you need to know is that these two typings are independent of each other, meaning that there are four different types of languages.

First, strong vs weak: A strongly typed language is a language where a value always have the same type, and you need to apply explicit conversions to turn a value into another type. Java is a strongly typed language. Conversely, C is a weakly typed language.

Secondly, dynamic vs static: A static language can usually be recognized by the presence of a compiler. This is not the full story, though – there are compilers for Lisp and Smalltalk, which are dynamic. Static typing basically means that the type of every variable is known at compile time. This is usually handled by either static type declarations or type inference. This is why Scala is actually statically typed, but looks like a dynamic language in many cases. C, C++, Java and most mainstream languages are statically typed. Visual Basic, JavaScript, Lisp, Ruby, Smalltalk and most “scripting” languages are dynamically typed.

See, that’s not too hard, is it? So, when I say that Ruby is a strongly, dynamically typed language, you know what that means?

C is a actually an interesting beast to classify. It’s the only weakly, statically typed language I can think of right now. Anyone has any more examples?

To find out more, read the book above, or look up “Type systems” on Wikipedia.



Introducing TIJuAVA – Java with Type Inference


Every time I’ve written Java code lately, I’ve been painfully aware of how much unnecessary code I write every time. And most of this is Java’s fault. This blog post is a very small thought experiment. TIJuAVA does not exist as software. Yet. If I someday have the time I would love to implement it, but there are more pressing needs right now.

So, what are the rules? Basically, all valid Java programs are valid TIJuAVA programs. Some valid TIJuAVA programs are not valid Java programs. Simply put, the main difference is that you don’t need to declare a type for any local variables or member variables. Type declarations are only necessary in method declarations. You can declare local variables and member variables if you want to, and in certain very unlikely circumstances you will need too.

Let’s take a very simple example. This code is taken from the JRuby source code, but I have added one or two things to make it easier to showcase:

package org.jruby.util.collections;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;

public class IdentitySet {
private items = new ArrayList();

public void add(Object item) {
items.add(item);
}

public void remove(Object item) {
iter = items.iterator();
while (iter.hasNext()) {
storedItem = iter.next();
if (item == storedItem) {
iter.remove();
}
}
}

public boolean contains(Object item) {
iter = items.iterator();
while (iter.hasNext()) {
storedItem = iter.next();
if (item == storedItem) {
return true;
}
}
return false;
}

private Collection getItems() {
return items;
}

private void something(java.util.AbstractSet inp) {
val1 = inp;
for(iter = val1.iterator();iter.hasNext();) {
System.err.println(iter.next());
}
}
}

This code doesn’t really show all that can be done with this approach, and if I were to show a real example, this blog would be unbearably filled with code. So, this is just a tidbit.

The TIJuAVA system would need to be implemented as a Java two-pass compiler. Basically, the first pass finds all variable names that need to have a type inferred, and then walks through the information it’s got, basic on method signatures and methods called on the variable. In almost all cases it will be possible to come to one conclusion on which type to use. The compiler would then generate regular Java byte code, basically the same bytecode that would have been generated had you written the types by hand.

Of course, most people use IDE’s to write code nowadays. Wizards and code generators and what not. So why something like this? Well, even though your IDE writes your code for you, it is still there, and you still have to understand it at some level. If not when writing, you would still need to read it. And boy does type declarations clutter things. Especially generics. And here is one interesting tidbit. Generic types would also be possible to infer in most cases.

Another thing that could be easily added is some kind of in-place literal syntax for lists and maps. This would be more like a macro feature, but the list syntax would mostly just be a call to Array.asList, which isn’t to bad.

An objection that I anticipate is from people who think that the code will be less readable by removing the type pointers. This should be more of a problem when you have large methods, but everyone these days use refactorings so they won’t have methods with a LOC over 20. And if that’s the case, the local variables should be easily understood by the operations that are used on them.

So. Someday, when I have time, this may be reality. If anyone is interested, that is.