6 months with Clojure


I have spent the last 6 months on a project where Clojure was the main technology in use. I can’t really say much about the project itself, except that it’s a fairly complicated thing with lots of analytics and different kinds of data involved. We ended up with an environment that had a lot of Ruby and JavaScript/CoffeeScript as well as Clojure. We are using Neo4J for most of our data storage.
In this blog post I wanted to basically talk about a few different things that has worked well or not so well with Clojure.

Being on 1.4

When the project started, Clojure 1.4 was in alpha. We still decided to run with it, so we were running Clojure 1.4alpha for about one month, and two different betas for another month or so. I have to say I was pleasently surprised – we only had one issue during this time (which had to do with toArray of records, when interacting with JRuby) – and that bug had already been fixed in trunk. The alphas and betas were exceptionally stable and upgrading to the final release of 1.4 didn’t really make any difference from a stack standpoint.

Compojure and Ring

We ended up using Compojure to build a fairly thin front end, with mostly JSON endpoints and serving up a few HTML pages that was the starting points for the JavaScript side of the app. In general, both Compojure and Ring works quite well – the ring server and the uberjar both worked with no major problems. I also like how clean and simple it is to create middleware for Ring. However, it was sometimes hard to find current documentation for Compojure – it seems it used to support many more things than it does right now, and most things people mention about it just aren’t true anymore.

Enlive

In order to get some dynamic things into our pages, we used Enlive. I really liked the model, and it was quite well suited for the restricted dynamicity we were after.

DSL with lots of data

One of my less bright ideas was to create an internal DSL for some of our data. The core part of the DSL was a bunch of macros that knew how to create domain objects of themselves. This ended up being very clean and a nice model to work with. However, since the data was in the amounts of millions of entries the slowness of actually evaluating that code (and compiling it, and dealing with the permgen issues) ended up getting unbearable. We recently moved to a model that is quite similar, except we don’t evalute the code, instead using read-string on the individual entries to parse them.

Dense functions

Clojure makes it really easy to create quite dense functions. I sometimes find myself combining five or six data structure manipulation functions in one go, then taking a step back and look at the whole thing. It usually makes sense the first time, but coming back to it later, or trying to explain what it does to a pair is usually quite complicated. Clojure has extraordinarily powerful functions for manipulation of data structures, and that makes it very easy to just chain them together into one big mess.
So in order to be nice to my team mates (and myself) I force myself to break up those functions into smaller pieces.

Naming

One aspect of breaking up functions like described above, is that the operations involved are usually highly abstract and sometimes not very coupled to domain language. I find naming of those kind of functions very hard, and many times spend a long time and still not coming up with something I’m completely comfortable with. I don’t really have a solution to this problem right now.

Concurrency

For some reason, we haven’t used most of the concurrency aspects of Clojure at all. Maybe this is because our problems doesn’t suit themselves to concurrent processing, but I’m not sure this is the root of the reason. Suffice to say, most of our app is currently quite sequential. We will see if that changes going forward.

Summary

I’ve been having a blast with Clojure. It’s clearly the exactly right technology for what I’m currently doing, and it’s got a lot of features that makes it very convenient to use. I’m really looking forward being able to use it more going forward.


Emerging Languages camp – day 2


The second day of Emerging Languages camp was at least as good as the first day. We also managed to squeeze in four more talks, since everybode agreed that the afternoon pause was too long and ineffective during day one. At the end of the day my brain was substantially melted that I didn’t even contemplate finishing these comments. But after some sleep I think I have a fresh perspective.

The sessions were a bit more varied compared to the first day – both in quality and how far out the ideas were. Because of how my interest in various subject vary, there might be some inconsistency in length of reporting on the different languages.

Anyway, here goes:

Kodu

Kodu is a language from Microsoft for creating games. It’s specifically aimed at kids to see if they can learn programming in a better way using something like this. The language uses icons and a backend text based syntax to make it easy for someone to program using structure instead of syntax. You get a basic 3d environment where you can modify and edit things in various ways. Another important part of the design is to get the game to quickly do something, so you get immediate feedback. Everything added to the language is user tested before adding it – including doing gender testing. They thought long and hard about whether they should add conjunctions or not – but ended up deciding for doing it. You work with an XBox when programming and running the game. It’s also free. Overall, Kodu looks like a really nice and innovative initiative, probably going back as far as Logo in terms of inspiration. Very nice.

Clojure

Rich didn’t actually talk much about Clojure in general, but decided to focus on a specific problem he is working on solving. His talk title doesn’t really say much about this, though: “Persistent, Transience, Persistents, Transients and Pods – invasion of the value snatchers”. It was a great talk with lots of information coming extremely fast. I found myself focusing more during this talk than during any other during the conference, just to follow all threads of thought.

Rich spent some time giving an introduction to persistent data structures so everyone knew how Clojure works with them – including how they are turned into transients – since that’s where the new feature comes in.

An important part of persistent data structures is that yu preserve the performance guarantees of a mutable equivalent of that data structure. Clojure uses bit-partitioned hash tries, originally described by Phil Bagwell. This allows Clojure to have structural sharing, which means it’s safe to “update” something – the old version is retained. It uses path copying to make it possible to udpate with a low cost. There is definitely cost to doing it, but it works well in a concurrent environment where other solutions would be more costly to get correct results.

Clojure has an epochal time model that allows Clojure to view things as values inbetween being “modified”. State is at one step higher than that, so you can see mutable change as a creation of a new value from an existing value that is then put into the same reference the original value existed in. Clojure has four different types of references with various semantics for coordination.

To get good performance, some Clojure functions will actually mutate state that is invisible to anyone else to efficiently create new data structures. To get performance that is acceptable to Rich Clojure, data structures are not implemented using purely immutable data structures (Okasaki style) from the Java side. Persistent data structures also doesn’t scale to larger changes, specifically multiple collections, several steps or other situations where you want to have functional end points but efficient mutation inbetween.

Transients is a feature that allows Clojure to give birth to a data structure. Clojures transients will accumulate changes in a safe way and can then finally give you a persistent value. Only vectors and hash-maps are currently supported. Lists are not, since there is no benefit in doing that. Transients also enforce thread isolation. Composite operations are OK, and so is multi-collection work and you don’t need any locks for this. This is already in Clojure, but they might be doing too much. They both handle editing and enforce the constraints on it, such as single-threadedness. Transients can sometimes return new values too, even on mutating operations.

Pods allow you to split out the policy from transients. Values go in, values go out. The process goes through the pod. Different policies are possible, such as single-threadness or mutexes. A pod knows how to make a transient version of a value. Functions to modify a pod will have to return a new thing (or the same thing). Dereferencing the pod allows you to get a new value from a pod at that point. This gives you the possbility to apply recipes on ordinary Java objects too. A good example is String vs StringBuilder. Pods can ensure lock acquisition order, but not lock composition – although pods can detect it at least. There are still a few details in the design that Rich hasn’t decided on yet.

All in all, a very interesting talk, about the kind of concurrency problems you wish your language had.

E/Caja

Mark Miller recapped the interaction models of the Web, starting with static frames going to the current mess of JavaScript fragments going back and forth, using JSONP, AJAX and Comet. He also talks a bit about  the adoption curves of languages and why some languages get adopted. Posits that a mess of features may be easier to get adopted. This means many languages succeed by adding complexity.

E is an experiment in expressing actors in a persistent way. He used some of the lessons from E combined with AJAX/JavaScript to create Caja, a secure language. Some of the features from Caja were then used  to start work on EcmaScript 5. They are currently working on a standard for SES, secure JavaScript. Dr. SES is an extension of this, that stands for Distributed, Resilient, Secure JavaScript. Object capabilities involve two additions to a regular memory safety and encapsulation model; effects only on held references, and no powerful references by default. This means a reference graph becomes an access graph. Caja can sanitize JavaScript to prevent malicious behavior, but preserve the semantic meaning of the program outside of that.

He showed some examples of how Caja can be used to sanitize regular JavaScript and have it running securely. Very interesting stuff, although the generated code didn’t look as amenable to debugging as something like CoffeeScript.

Fancy

Fancy is a language that tries to be friendly to newcomers, with good documentation, a clean implementation and so on. It’s inspired b several languages: Smalltalk (pure message passing, everything’s an object, dynamic, class based OO, metaprogramming, reflective),  Ruby (file based, embraces UNIX, literal syntax, class definition is executable script, fixed some inconsistencies with procs/lambdas/blocks), Erlang (message passing concurrency, light weight processes – not implemented yet). Fancy takes the opinion that first class is good; classes, methods, documentation, tests should all be first class. FancySpec is a simple version of RSpec. Tests for all built in classes and methods are there. These tests are not dependent on implementation. There are plans to port Fancy to a VM. Methods marked with NATIVE will have an equivalent method in Fancy and in the interpreter, to improve performance.

It’s got dynamic scoping and method caching. Logic can be defined based on the sender of a message, which makes it possible to do things like private and public.

Exceptions are taken directly from the implementation (ie C++).

The language seems to be pretty similar to Ruby in semantics, but more Smalltalk like syntax.

BitC

BitC is geared towards critical systems code. Resource contrained, CPU, memory, those kind of areas. One cache miss sometimes counts. Abstraction is fine, but only if it’s the right one. Variance constrained too. Predictability is very important, so something like a JIT can be a problem. Statically exception free. “Zero” runtime footprint. Non-actuarial risk model. Mean time between failures in decades. Problem is to establish confidence. After other failures in this area, the conclusion has been that BitC shouldn’t be a prover.

The language is an imperative functional language with HM-style parametric type system. You have explicit control of representation. State is handled in a first class manner. Inferencing actually infers mutability in lots of cases. Dependent range checking isn’t there yet, but is coming soon. “The power of ML/Haskell”, “The low-level expressiveness of C”, “Near-zero innovation”.

Trylon

Trylon is a small language, indentionation based and compiles through C. It’s object oriented, with prototypes under the class based system. According to the author, nothing really new in the language – he just did it for his own sake. There are no users so far except for the author.

ooc

The language tries to be a high level low level language. It mixes paradigms quite substantially and has some nice features. It’s class based, and mostly statically typed.

Coherence/Subtext

Jonathan Edwards started this presentation by showing a small example where the ordering of statements in an implementation is dependent on what representation you use for data, and shows that it’s impossible to handle this case in a general way. From that point he claims that there is a fundamental tension between styles in a language, and you can only get two of these three: Declarative programming, Mutable state and Data Structures. I’m not sure if I agree with his conclusions, and the initial example didn’t feel like anything I’ve ever had trouble with.

Based on the conclusion that you can only have two of the three, he goes on to claim that the thing that cases all these problems is aliasing. So in order to avoid aliasing, his system uses objects where instances are physically always contained within another object. This means you can refer to these objects without having actual pointers – and thus cannot do aliasing either. From that point on, his system allows declarative programming of the flow, where updates never oscillate back out to create more updates.

Lots of interesting ideas in this talk, but I’m not sure I agree with either the premise or the conclusions.

Finch

Finch is a small programming language, bytecode compiled with fibers, blocks, TCO, objects, prototypes, a REPL and Smalltalk style message selectors. In the feature, the author aims to add metaprogramming, some self-hosting, continuations and concurrency features.

Circa

Circa is a small programming language that allows you to get immediate feedback. It’s aimed at game programming, and achieves this by running the script many times (one time for every frame as far as I understood it). You then specify what state you have in your program, and this state will be automatically persisted between invocations, so that a specific invocation of a specific function will always get access to the same state it started out with. This was a very interesting but weird model. It seems to work really well for smaller prototyping of games and graphics but I’m wondering what can be done to expand it.

Wheeler

Wheeler is a proof of concept presented by Matt Youell. It’s pretty hard to describe, and I’m not even sure if there’s a computational model there yet. The project is apparently just a few weeks old, and the ideas are still in progress. The basic tenets of the language seems to be that you work with categories of things and establish transitions between them. A transition pattern matches things that it looks for, which means that things like syntax and ordering doesn’t mean as much. The author calls it mutual dispatch, because it uses the types/categories of everything involved to establish what transitions to use. At this point there is no time model, so everything happens in one sweep, but once a time model gets in there it might be very interesting. To me it looked a bit like a cross between neural networks and cellullar automata.

Interval arithmetic

Alan (Mr Frink) gave a talk about the problems with floating point numbers, and one way of handling that. Floating point numbers cause problems by making it possible to introduce small errors.

Intervals is a new kind of number. It represents a specific number by giving two end points and saying the real number is somewhere within that interval. You can see it in two different ways: “the right value’s in there somewhere but I’m not sure where” or “the variable takes on ALL values in the interval simultaneously”.

This was a very interesting discussion, and you can find out more about it from Frink’s web page (just search for Frink and interval arithmetic). At the end of the presentation, Alan gave this challenge to other languages:

for:

x=77617

y=33096

calculate:

((333 + 3/4) – x^2) y^6 + x^2 (11x^2 y^2 – 121 y^4 – 2) + (5 + 1/2) y^8 + x/(2y)

Ioke handles it correctly, both using ratios and using decimals.

Stratified Javascript

Stratified JavaScript adds some concurrency features to JavaScript based on Strata. It looked like a very principled approach to giving JS concurrency primitives that are easy to use at the same time as they are very powerful. The presenter showed several examples of communication, blocking and coordination working really well.

Factor

Factor is a very high level stack based language created by Slava Pestov. He went through some of the things that Factor does well and other dynamic programming languages handle less well, like reloading code from the REPL. Lots of other small tidbits of how powerful Factor is and how expressive a stack language can be. At the end of the day I still think it’s interesting how much Ioke code sometimes resemble Factor, even though the underlying semantics are vastly different.

D

Walter Bright showed D, his systems level programming language. He focused on showing that it can do several different paradigms in the same language – all of it looked very, very clean, but I got the impression that D is an extremely big language from these examples. To summarize, D can do inline assembler, class based OO, generative programming, RAII, procedural, functional and concurrent programming (and I probably missed a few). I liked the approach to immutability, but I must admit I’m scared of the sheer size of the language. It’s impressive how such a big language can get so good at compile times.

AmbientTalk

AmbientTalk is a language built on top of Java that puts communication in center. It is supposed to be used in areas where you have bad network connectivity and want to communicate inbetween different devices in a flexible way. Things like network outages aren’t exceptions because they will happen all the time in the environments AmbientTalk is built on. The language embraces futures to a large degree and also takes a principled approach to how Java integration works – so that if you send an AmbientTalk object in to Java, it will work as if you had sent it to a remote device, and the only way Java can interact with that object is by sending messages to it. Much interesting stuff in this talk.

And that was it. I can obviously not capture all the interesting hall way and pub conversations that were had, but hopefully this summary will be helpful until the videos come along in two to four weeks. I would call this conference a total success and I really look forward to next year.



The Clojure meetup and general geekiness


The Bay Area Clojure user group threw a special JavaOne special, with Rich Hickey as special guest on Wednesday afternoon. I went there and it turned out to be a large collection of former and current ThoughtWorkers there, among all the other Clojure enthusiasts. The model was lightning talks for a while and then general town hall with Rich answering questions. The reality turned out to be a bit different – firstly because people spent quite long on their talks, and people asked many questions and so on. The second problem was that the projector in the place had some serious problems – which basically ended up resulting in everyone projecting pink tinted presentations.

There were several interesting talks. The first one took a look at what the Clojure compiler actually generates. This turned a bit funny when Rich chimed in and basically said “that doesn’t look right” – the presenter had simplified some of what was happening. I don’t envy the presenter in this case, but it all turned into good fun, and I think we all learned a bit about what Clojure does during compilation.

There was a longer talk about something called Swarmli, which was a very small distributed computing network, written in about 300 lines of code. I defocused during that talk since I had to hack some stuff in Ioke.

After that, one of the JetBrains guys showed up the new IntelliJ Clojure plugin. It seems to be quite early days for it still, but there is potential to get good cross language refactoring, joint compilation and other goodies there.

Finally, my colleague Bradford Cross did a very cool talk about some of the work he’s currently doing at a startup. The work seems to be perfectly suited for Clojure, and the code shown was very clear and simple. Very cool stuff, really. ThoughtWorks – actually using Clojure at client projects. Glad to see that.

After that it was time for Rich Hickey. Rich decided to give a lightning talk himself – about chunked sequences. Very cool in concept, but actually one of those ideas that seem very simple and evident after the fact. Chunked sequences really seems to promise even better Clojure performance in many cases – without even requiring changes to client code.

After that there was a general Q&A session, where questions ranged all over the map, from personal to professional. One of the more conentious things said was about Rich’s attitude to testing. This caused lots of discussions later in the evening.

All in all, this was really a great event. We ended up at a nearby bar/restaurant afterwards and had long discussions about programming languages. A great evening.



First days of JavaOne and CommunityOne


I’ve been spending the last few days in San Francisco, attending CommunityOne and JavaOne. We are right now up to the second day of JavaOne, so I felt it would be a good idea to take a look at what’s been going on during the first two days.

I will not talk about the general sessions here since I as a rule avoid going to them. So, I started out CommmunityOne seeing Guilloume talk about what is new in Groovy 1.6. Pretty interesting stuff, and many useful things. Although, one of the things I noted was that many of the default usages of AST transformations actually just make up for the lack of class body code. Things like “@Singleton” that neeeds an AST transformation in Groovy, is a very simple thing to do by executing code in the class body in Ruby.

After that I saw John Rose talk about the Da Vinci machine project. Pretty nice stuff going on there, really. The JVM will really improve with this technology.

Charles Nutter did a reprise of his Beyond Impossible JRuby talk. It’s a really good talk that focuses on the things that you really wouldn’t think possible to do on the JVM, that we’ve had to do to get JRuby working well.

Guido talked about Python 3000 – much of that was really a look at the history of Python, and as such was really interesting. Unfortunately, my jetlag started to get the better of me at that point, so my focus could have been better.

For me, the first day of JavaOne started out with the Script Bowl. This year the languages represented was Jython, Groovy, Clojure, Scala and JRuby. I think they all did a pretty good job of showcasing the languages, although it’s very hard to do that in such a small timeframe. I think I sympathized the most with Rich Hickey (creator of Clojure) – the reason being that the Clojure model is the most dissimilar from the rest of the languages. But this dissimilarity is actually the key to understanding why Clojure is so powerful, so if you don’t understand it, you’re just going to be turned of by Clojure’s weird surface semantics. (Hint: they are not weird, they are necessary and powerful and really cool). Rich did a valiant effort to conveying this by talking a lot about the data structures that is Clojure, but I’m unsure how much of it actually penetrated.

Tom did a great job with the JRuby demos – he had a good flash 3d game running using a JRuby DSL, and then some slides showcasing how much benefit JRuby gets from the Ruby community. Good stuff.

After that I went to Rich’s Clojure talk. I’ve seen him give similar talks several times, but I don’t get tired of seeing this. As usual, Rich did a good job of giving a whirlwind tour of the language.

After lunch I went to the talk by Konstantin about JetBrains MPS. I was curious about MPS since I’ve been spending time with Intentional lately. I came away from the talk with a pretty different view of MPS compared to going in, actually. My initial reaction is that MPS seems to be pretty limited to what you can do with Intentional.

Then it was time to see Yehuda Katz talk about Ruby – this was a great intro to Ruby and I think the audience learned a lot there.

The first evening of JavaOne was really crazy, actually. I ended up first going to Brian Goetz and John Rose’s talk about building a Renaissance VM. This was a bit of an expansion of John’s CommunityOne talk, and gave a good overview of the different pieces we’re looking at in JSR 292, and also other things that should be in the JDK in some way to make a multi-language future possible.

Tobias Ivarsson gave a BOF about language interoperability on the JVM. This ended up being more about the interface injection feature that Tobias has been hacking on. We had some pretty good discussion, and I think we ended up with a feeling that we need to discuss this a bit more – especially if the API should be push or pull based. Good session by Tobias, though.

And then it was finally time for my BOF, called Hacking JRuby. This was actually a pretty mixed things, containing lots of funny small pieces of JRuby knowledge that can be useful if you want to do some weired things with JRuby. The slides can be found here: http://dist.codehaus.org/jruby/talks/HackingJRuby.pdf. I think the talk went pretty well, although it was in a late slot so not many people showed up.

The final session of the day was a BOF called JRuby Experiences in the Real World. This ended up being a conversation between about 10-12 people about their JRuby experiences. Very interesting.

After that I was totally beat, and ended up going home and crashing. So that was my first day at JavaOne.



QCon London – Wednesday (Emerging Languages)


The first day of the proper QCon conference started out with Sir Tony Hoare doing a keynote about the difference and overlap between the science and engineering of computing. Fairly interesting, but the questions and answers were much more interesting stuff. One of the more interesting points made by Hoare was that in his view, a full specification is a generalization of testing. After the keynote I started out my track called Emerging Languages in the Enterprise. I introduced this track, doing 15 minutes of talking about my views on programming languages. The slides for my piece can be found here: http://olabini.com/presentations/ELITE.pdf. My talk was made much more interesting by Tony Hoare being in the front row. That made the whole thing a bit more daunting, obviously… =)

I then spent the rest of the day in my track – which was very good. I am very happy with all the presentations, and felt the track was a great success. First of was Michael Foord, talking about IronPython, and how Resolver uses IronPython to create a great product. Some interesting lessons and information there.

After lunch Jonas Bonér talked about Real-world Scala. The presentation gave a good grounding in Scala without looking at all small details – instead Jonas talked about more high level concerns and styles.

After that, Rich Hickey did a great presentation about Clojure. Rich did a great presentation, talking about Clojure from the ground up. It was very well received.

Martin Fowler did a fantastic presentation on ThoughtWorks experience with Ruby. The room was packed for this.

The final presentation in my track was Attila Szegedi talking about JavaScript in the Enterprise. This was also a great presentation, and gave me some new insight into what you could achieve with Rhino.

All in all, the full track was excellent, and all the presentations achieved pretty much what I hoped from them. I learned a lot from all of them.

After the final session of my track, Martin Fowler and Zach Exley did the evening keynote, talking about how technology helped the Obama compaign. Very interesting stuff too. At the end of the day, a very good day at QCon.



Clojure


I know I’ve mentioned Clojure now and again in this blog, but I haven’t actually talked that much about it. I feel it’s time to change that right now – Clojure is in the air and it’s looking really interesting. More and more people are talking about it, and after the great presentation Rich gave at the JVM language summit I feel that there might be some more converts in the world.

So what is it? Well, a new Lisp dialect for the JVM. It was originally targeting both the JVM and .NET but Rich ended up not going through with that (a decision I can understand after seeing the efforts Fan have to expend to continue providing this feature).

It’s specifically not an implementation of either Common Lisp nor Scheme, but instead a totally new language that’s got some interesting features. The most striking feature of it is the way it embraces functional programming. In comparison to Common Lisp who I characterize as being a multiparadigm language, Clojure has a heavy bent towards functional programming. This includes a focus on immutable data structures and support for good concurrency models. He’s even got an implementation of STM in there, which is really cool.

So what do I think about it? First of all, it’s definitely a very interesting language. It’s also taken the ideas of Lisp and twisting them a bit, adding some new ideas and refining some old ones. If I wanted to do concurrency programming for the JVM I would probably lean more towards Clojure than Scala, for example.

All that said, I am in two minds about the language. It is definitely extremely cool and it looks very useful. The libraries specifically have lots to say for them. But the other side of it for me is from the point of Lisp purity. One of the things I really like about Lisps is that they are very simple. The syntax is extremely small and in most cases everything will just be either lists or atoms and nothing else. Common Lisp can handle other syntax with reader macros – which end up with results that are still only lists and atoms. This is extremely powerful. Clojure has this to a degree, but adds several basic composite data structures that are not lists, such as sets, arrays and maps. From a pragmatic standpoint I can understand that, but the fact that they are basic syntax instead of reader macros mean that if I want to process Clojure code I will end up having to work with several kinds of composite data structures instead of just one.

This might seem like a small thing, and it’s definitely not something that would stop me from using the language. But the Lisp lover in me cringes a bit at this decision.

All in all Clojure is really cool and I recommend people to take a look at it. It’s getting lots of attention and people are writing about it. Stu Halloway is currently in the process of porting Practical Common Lisp to Clojure, and I recently saw a blog post about someone porting On Lisp to Clojure, so there is absolutely an interest in it. The question is how this will continue. As I’ve started saying more and more: these are interesting times for language geeks.



Stu’s Java.next series


If you haven’t already seen this, let me totally recommend Stuart Halloway’s series on the languages he call Java.next. The series look at several different aspects of these languages (Groovy, Scala, Clojure, JRuby), contrast them with each other and Java. Highly recommended if you are in any way interested in the languages that will soon replace Java for much application development.

The three published parts are:

  1. Java.next: Common Ground
  2. Java.next #2: Java Interop
  3. Java.next #3: Dispatch