NRegex separated into its own project

If you’re interested in my regular expression engine for .NET, you can now download and build it from The files will shortly be removed from the Ioke source tree.

Opinions on F#

I decided quite early on that I wanted to use F# for a piece of the Ioke implementation. Josh Graham (a colleague at ThoughtWorks) suggested using it for the pieces of code doing operator shuffling. I first did a C# implementation, since it was quicker to port form Java to C#, then got that working, and finally reimplemented it in F#. The resulting code ended up being not that much smaller, but definitely more readable. I like using functional idioms if possible, but the fact of it is that the operator shuffling code is extremely imperative.

I definitely like the experience. If I can, I will aim to replace more pieces of the implementation with F#. The main problem standing in the way of that is that there doesn’t seem to be any good way of handling joint compilation. For me to use F# for real, I would need a compiler that can handle mutual interdependencies between the F# and C# code. At the moment I’m using interfaces and factories and separate modules to handle it, but that only works since the op shuffling code is very separate from the rest of the implementation. I don’t have many such pieces, so a joint compiler would be useful.

The code ended up being very clean due to the lack of type annotations and indentation for structure. This is nice. Pattern matching work very well, although the documentation on how you can combine patterns seem to be on the weak side. I had to guess my way forward.

Type annotations ended up being necessary in most code that took .NET types as arguments. This made it a bit hard to see the structure, since I ended up having to add type tags in many places. Most code will probably not be using so much .NET types, so this was a problem with what I was trying to do, I think.

The Emacs support for F# indentation ended up having trouble with the “else” in if-else statements, so I avoided if statements and went wild with patterns instead. I like the way it looks, actually.

One of the main properties of a functional language is obviously the focus on using functions for things, and this was of course one of the nicer aspects. In F# I could remove a full implementation of the Strategy pattern, and just go with curried functions instead. Instead of loops, I ended up using locally defined recursive functions. This reads really well, and feels natural for me.

The fsc compiler is extremely slow – at least on Mono. This made the turnaround cycle a bit longer then ideal. Another thing to take note of is that – just like for all other ML implementations – the error messages for type problems are way more then cryptic. Totally inscrutable, not very helpful, and generally pointing at the wrong line. This is one area where I think the F# team need to improve matters a bit. I know it’s hard with the algorithms used for type inference, but if it’s going to be part of Visual Studio 2010, good error messages and error handling is obviously important.

Oh, and the fsc –help text is totally unhelpful, and very out of date.

All in all, I really liked the experience, with the caveats above noted. I hope I can figure a good way to replace more of my internals with F#, and I am going to take a second pass at the op shuffle algorithm and see if I can make it more functional.

If you’re on the .NET platform, I recommend that you take a look at F#. It’s very nice, and allow different ways of expressing algorithms that can make your code much clearer.

Opinions on C# and .NET

After my recent exposure to C#, I thought I’d write up my thoughts about it and .NET. These will all mostly be in comparison with Java, rather then Ruby – since the implementation is a port of a Java project.

C# and Java started out very similar to each other. They still are, really. But they have grown in different directions. Some things are very nice, some things seem nice, but I didn’t use them, and some things are really problematic. When reading this, I might come of as harsh on C# and .NET. That’s not really my intent – Java and the JVM has its problems too, and I wouldn’t dare to suggest whether C# or Java is better.

The largest difference between C# and Java that really made a large change for me was that C# doesn’t have local anonymous classes. Java has, and these are highly useful. Of course, C# has delegates with lambda expressions instead, and they solve much the same problem. But there are two problems with delegates that make it impossible to use them for all cases. First, an anonymous type in Java can implement several interdependent methods. You can factor behavior local to that piece of code. That doesn’t work with delegates. Instead you’ll have to resort to ugly hacks (in Ioke I make each NativeMethod have references to two different delegates that interact with each other). The second problem is once again the question of intent. I spoke about this in the last post, and I will mention it again. Interfaces are about intent, and they get less useful if you can’t express intent well with them. That’s why the generic Func delegates might not be a good solution in all cases.

The second thing I noticed was the proliferation of “primitive” types. I knew at some level that C# had unsigned and signed versions of things, but I’d forgotten it. It’s actually pretty nice to have those available.

Enums in C# are quite bad compared to Java. The main distinction is that they are based on integers. This gives some fairly strange results in some cases. The one that really bit me was when I forgot to give a default value to an enum field – and expected the default to be null. That isn’t true. The default value for an enum will be the value in it that maps to 0 – which is usually the first element of the enum list. I recommend people using enums to always explicitly init them.

Extension methods seem very useful, and they have been used to add some really nice things in the .NET core library. That said, I didn’t use them for my implementation, so I don’t have any real experience with them.

One thing that really surprised me about .NET was that there is still no support for arbitrary precision math – neither big nums nor big decimals. I ended up implementing that myself, so now there is at least one open source library with liberal license that people can use.

Same thing with regular expressions. The implementation in .NET obviously works, but there are too many incompatibilities in the implementation. Especially the handling of named groups is so different I couldn’t get it to work for Ioke. I ended up implementing NRegex, which is a perl5.6 compatible regular expression engine. It supports named groups, is thread safe, supports look ahead and look behind, and is compliant with level 1 of Unicode Regular Expression Guidelines.

At the end of the day, it was an interesting experience, and nothing surprised me that much. Not really. Most of the things are nitpicks. If it weren’t for one small detail…

Namely equality and hash codes for collections. Why in the name of anything holy doesn’t .NET provide implementations for Equals and GetHashCode? In this day and age? Even if it was a mistake from the beginning, why couldn’t they have fixed that when adding the generic collections? I don’t expect to have to implement these things myself. I especially don’t expect to have to provide my own subclasses of any collection I need to work with. This seriously annoyed me, and made the whole thing take some time, since the bugs produced by it was very hard to pinpoint. And oh yeah, when we’re talking about collections, it’s good to keep in mind that ArrayList.Sort is _not_ stable. It’s using quick sort. If you want a stable sort you’ll have to implement a merge sort or something like that for yourself. This also came as a surprise to me, but it was easily found at least. Since I had a pretty good test suite… =)

Anyway. That’s it.

.NET and interface madness

The original Ioke implementation uses interfaces in place of classes when referring to collections. This is standard in the Java world, and wasn’t something I explicitly designed in, although it did make things very convenient in some cases, when I switched implementation strategies, or wanted to add ordering to to a dictionary. No references changed, just the pieces creating new instances. Very nice.

When I started the Ioke port to the CLR, I assumed that the collections work the same way in .NET, especially since there at first glance seem to be several interfaces for collections.

Boy was I wrong.

My first piece of unease came when I realized that there was no specific Set interface, so I ended up using ICollection instead. That works, but I use interfaces to capture intent, and in the specific case the intent to have a set was extremely important.

The next problem cropped up when I realized there was no unification of generic interfaces and non-generic interfaces. This is obviously not possible since the CLR uses reified generics, but I still hadn’t actually thought about it. This turned out to be extremely annoying in some cases – especially with regards to ordered collections. OrderedDictionary is more or less a counterpart of LinkedHashMap in Java – it keeps track of insertion order and will yield the entries in that order when iterating. This can be highly useful functionality. I wanted it for representing cells, like I do in the Java implementation. Problem is, cells is a collection that should be generic – but OrderedDictionary is not generic. Sigh. I could of course implement my own versions of these, but the point of a collections library is that I shouldn’t HAVE to do that.

The next item in the case against .NET happened when I started getting strange exceptions when I called ICollection.Add with a HashSet in that reference. The objects sent to Add shouldn’t have been any problem, but I still got (but only sometimes) exceptions saying my object was out of range, or something like that. I tried different things, but nothing solved it. At that point I noticed that HashSet<T> had two different implementations of Add. One regular, and one explicit ICollection.Add implementation. I couldn’t figure out if they were the same or not, but since it was listed separately, I thought they might be different. So I changed the original reference from ICollection<T> to HashSet<T> … And the error went away. I’m not sure if this is a bug in Mono, or .NET, or something else really weird, but that’s the behavior I saw.

And that brings up my main point. You can’t really use many of the interfaces in the .NET core library. Interfaces are about hiding implementation details and expressing intent – but when they don’t work, or are mismatched like they are in the .NET collections, this is impossible to do well. Bringing in explicit interface implementations that do different things depending on called with the interface as reference or not, only makes the situation even harder.

The story has a sad ending. I reverted to using explicit implementation classes for all references to collections. This sucks in many ways.

Some .NET reflections

I used to do .NET, when it was new. I’ve been away from it for several years though, so coming back to it feels like jumping in to a new environment. I’ve been using C# and F# for the new Ioke implementation, and during this time I’ve collected some thoughts and opinions. These are all framed in my regular use of Java, of course. And if this is obvious stuff to anyone else, please feel free to ignore me.

I won’t write everything in this post, though. I’ll split it up a bit. This is just the folio.

Incidentally, if you have seen tweets from me sounding pissed of, chances are good you will get explanations for it from these posts. At the moment I will write three posts: Opinions on F#, Opinions on C# and .NET, and .NET and interface madness. That is what I have planned right now, at least.

Ioke for the CLR released

The last two weeks I’ve been furiously coding away to be able to do this. And I’m finally at the goal. I am very happy to announce that the first release of Ioke for the CLR is finished. It runs on both Mono and .NET.

Ioke is a language that is designed to be as expressive as possible. It is a dynamic language targeted at several virtual machines. It’s been designed from scratch to be a highly flexible general purpose language. It is a prototype-based programming language that is inspired by Io, Smalltalk, Lisp and Ruby.

Programming guide:

Ioke E ikc is the first release of the ikc Ioke machine. The ikc machine is implemented in C# and F# and run’s on Mono and .NET. It includes all the features of Ioke E ikj, except for Java integration. Integration with .NET types will come in another release.


  • Expressiveness first
  • Strong, dynamic typing
  • Prototype based object orientation
  • Homoiconic language
  • Simple syntax
  • Powerful macro facilities
  • Condition system
  • Aspects
  • Runs on both the JVM and the CLR
  • Developed using TDD
  • Documentation system that combines documentation with specs

There are several interesting pieces in ikc. Among them I can mention a new regular expression engine (called NRegex), a port of many parts of gnu.math, providing arbitrary precision math, and also an implementation of BigDecimal in C#.

Ruby.NET closing down

A letter on the Ruby.NET mailing list just announced that Dr. Wayne Kelly have decided to shut down the Ruby.NET project and instead asks everyone to contribute to IronRuby.

More information can be found on the mailing list thread here.

I don’t like these news at all. In many ways having a strong competitor is something that will improve the ecosystem for everyone. Now IronRuby will become the only player on the field – unless other people (like Ted Neward and David Peterson) decide to pick up Ruby.NET. I hope someone does. The .NET world will be better of for it.

The crucial question is not whether we trust John Lam about IronRuby. The question is if we trust Microsoft to do the right thing. Do we?