6 months with Clojure


I have spent the last 6 months on a project where Clojure was the main technology in use. I can’t really say much about the project itself, except that it’s a fairly complicated thing with lots of analytics and different kinds of data involved. We ended up with an environment that had a lot of Ruby and JavaScript/CoffeeScript as well as Clojure. We are using Neo4J for most of our data storage.
In this blog post I wanted to basically talk about a few different things that has worked well or not so well with Clojure.

Being on 1.4

When the project started, Clojure 1.4 was in alpha. We still decided to run with it, so we were running Clojure 1.4alpha for about one month, and two different betas for another month or so. I have to say I was pleasently surprised – we only had one issue during this time (which had to do with toArray of records, when interacting with JRuby) – and that bug had already been fixed in trunk. The alphas and betas were exceptionally stable and upgrading to the final release of 1.4 didn’t really make any difference from a stack standpoint.

Compojure and Ring

We ended up using Compojure to build a fairly thin front end, with mostly JSON endpoints and serving up a few HTML pages that was the starting points for the JavaScript side of the app. In general, both Compojure and Ring works quite well – the ring server and the uberjar both worked with no major problems. I also like how clean and simple it is to create middleware for Ring. However, it was sometimes hard to find current documentation for Compojure – it seems it used to support many more things than it does right now, and most things people mention about it just aren’t true anymore.

Enlive

In order to get some dynamic things into our pages, we used Enlive. I really liked the model, and it was quite well suited for the restricted dynamicity we were after.

DSL with lots of data

One of my less bright ideas was to create an internal DSL for some of our data. The core part of the DSL was a bunch of macros that knew how to create domain objects of themselves. This ended up being very clean and a nice model to work with. However, since the data was in the amounts of millions of entries the slowness of actually evaluating that code (and compiling it, and dealing with the permgen issues) ended up getting unbearable. We recently moved to a model that is quite similar, except we don’t evalute the code, instead using read-string on the individual entries to parse them.

Dense functions

Clojure makes it really easy to create quite dense functions. I sometimes find myself combining five or six data structure manipulation functions in one go, then taking a step back and look at the whole thing. It usually makes sense the first time, but coming back to it later, or trying to explain what it does to a pair is usually quite complicated. Clojure has extraordinarily powerful functions for manipulation of data structures, and that makes it very easy to just chain them together into one big mess.
So in order to be nice to my team mates (and myself) I force myself to break up those functions into smaller pieces.

Naming

One aspect of breaking up functions like described above, is that the operations involved are usually highly abstract and sometimes not very coupled to domain language. I find naming of those kind of functions very hard, and many times spend a long time and still not coming up with something I’m completely comfortable with. I don’t really have a solution to this problem right now.

Concurrency

For some reason, we haven’t used most of the concurrency aspects of Clojure at all. Maybe this is because our problems doesn’t suit themselves to concurrent processing, but I’m not sure this is the root of the reason. Suffice to say, most of our app is currently quite sequential. We will see if that changes going forward.

Summary

I’ve been having a blast with Clojure. It’s clearly the exactly right technology for what I’m currently doing, and it’s got a lot of features that makes it very convenient to use. I’m really looking forward being able to use it more going forward.


Notes on syntax


The last few years the expressiveness of programming languages have been on my mind. There are many things that comes into consideration for expressiveness, not matter what definition you actually end up using. However, what I’ve been thinking about lately is syntax. There’s a lot of talk about syntax and many opinions. What made me start thinking more about it lately was a few blog posts I read that kind of annoyed me a bit. So I thought it was time to put out some of my thoughts on syntax here.

I guess the first question to answer is whether syntax matters for a programming language. The traditional computer science view is largely that syntax doesn’t matter. And in a reductionist, system level view of the world this is understandable. However, you also have the opposite view which comes strongly into effect especially when talking about learning a new language, but also for reading existing code. At that point many people are of the opinion that syntax is extremely important.

The way I approach the question is based on programming language design. What can I do when designing a language to make it more expressive for as many users as possible. To me, syntax plays a big part in this. I am not saying that a language should designed with a focus on syntax or even with syntax first. But the language syntax is the user interface for a programmer, and as such there are many aspects of the syntax that should help a programmer. Help them with what? Well, understanding for one. Reading. Communicating. I suspect that writing is not something we’re very much interested in optimizing for in syntax, but that’s OK. Typing fewer characters doesn’t actually optimize for writing either – the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. You would definitely have to type much less – but would that in any way help you write the book? No, probably it would make it harder. So typing I definitely don’t want to optimize. However, I would like to make it easy for a programmer to express an idea as consicely as they can. To me, this is about mentioning all things that are relevant, without mentioning irrelevant things. But incidentally, a syntax with that property is probably going to be easier to communicate with, and also to read, so I don’t think focusing on writing at all is the right thing to do.

Fundamentally, programming is about building abstractions. We are putting together extremely intricate mind castles and then try to express them in such a way that our computers will realize them. Concepts, abstractions – and manipulating and communicating them – are the pieces underlying programming languages, and it’s really what all languages must do in some way. A syntax that makes it easier to think about hard abstractions is a syntax that will make it easier to write good and robust programs. If we talk about the Sapir-Whorf hypothesis and linguistic relativity, I suspect that programmers have an easier time reasoning about a problem if their language choice makes those abstractions clearer. And syntax is one way of making that process easier. Simply put, the things we manipulate with programming languages are hard to think about, and good syntax can improve that.

Seeing as we are talking about reading – who is this person reading? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on. Optimally we would like to have both, I guess, but that doesn’t seem very realistic. The things that make syntax useful to an expert are different than what makes it easy to read for a novice.

At this point I need to make a request – Rich Hickey gave a talk at Strange Loop a few months ago. It’s called Simple made Easy and you can watch it here: http://www.infoq.com/presentations/Simple-Made-Easy – you should watch it now.

Simply put, if you had never learnt any German, should you really expect to be able to read it? Is it such a huge problem that someone who has never studied Prolog will have no idea what’s going on until they study it a bit? Doesn’t it make sense that people who understand German can express all the things they need to say in that language? Even worse, when it comes to programming languages, people expect them to be readable to people who have never programmed before! Why in world would that ever be a useful goal? It would be like saying German is not readable (and is thus a bad language) because dolphins can’t read it.

A tangential aspect to the simple versus easy of programming languages is also how our current syntactic choices echo what’s been done earlier. It’s quite uncommon with a syntax design that becomes wildly successful while looking completely different from previous languages. This seems to have more to do with how easy a language is to learn, rather than how good the syntax actually is by itself. As such, it’s suspect. Historical accidents seem to contribute much more syntax design than I am comfortable with.

Summarizing: when we talk about reading programming languages, it doesn’t make much sense to optimize for someone who doesn’t know the language. In fact, we need to take as a given that a person knows a programming language. Then we can start talking about what aspects reduce complexity and improve communication for a programmer.

When are talking about reading of languages, one thing that sometimes come up is the need for redundancy. Specifically, one of the blogs that inspired these thoughts basically claimed that the redundancy in the design of Java was a good thing, because it improved readability. Now, I find this quite interesting – I have never seen any research that explains why this would be the case. In fact, the only argument in support I’ve heard that backs up the idea is that natural languages have highly redundant elements, and thus programming languages should too. First, that’s not actually true for all natural languages – but we must also consider _why_ natural languages have so much redundancy built in. Natural languages are not designed (with a few exceptions) – they grow to have the features they have because they are useful. But reading, writing, speaking and listening of natural languages have so different evolutionary pressures from each other that they should be treated differently. The reason we need redundancy is simply because it’s very hard to speak and listen without it. For all intents and purposes, what is considered good and idiomatic in spoken language is very different from written language. I just don’t buy this argument for redundancy. It might be good with redundancy in programming language syntax, but so far I remain to be convinced.

It is sometimes educational to look at mathematical notation. However, mathematical notation is just that – notation. I’m not convinced we can have one single notation for programming languages, and I don’t think it’s something to aspire to. But the useful lesson from math notation is how terse it is. However, you still need to spend a long time to digest what it means. That’s because the ideas are deep. The thinking that went into them is deep. If we ever come to a point where programming languages can embody as deep ideas in as terse a notation, I suspect we will have figured out how to design programming language syntax that is way better than what we have right now.

I think this covers most of the things I wanted to cover. At some point I would like to talk about why I think Smalltalk, Ruby, Lisp and some others have quite good syntax, and how that syntax is intimately related with why those languages are powerful and expressive. Some other random thoughts I wanted to cover was evolvability of language syntax, whether a syntax should be designed to be easy to parse, and possibly also how much English specifically has impact the design of programming languages. But these are thoughts for another time. Suffice to say, syntax matters.



Announcing JesCov – JavaScript code coverage


It seems the JavaScript tool space is not completely saturated yet. As I mentioned in my previous post I’ve had particular trouble finding a good solution to code coverage. So I decided to build my own version of it. The specific feature to notice is transparent translation of source code and support for branch coverage. It also has some limitations at the moment, of course. This is release 0.0.1 and as such is definitely a first release. If you happen to use the Jasmine JUnit runner it should be possible to drop in this directly and have something working immediately.

You can find information, examples and downloads here: http://jescov.olabini.com



JavaScript in the small


My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.

Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.

I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.

The module pattern

The basic idea of the module pattern is that you encapsulate all your code in anonymous functions that are then immediately evaluated to generate the actual top level object. Since JavaScript has some unfortunate problems with global variables (like, they are there), it’s safest to just put all your code inside of one or more of these modules. You can also make your modules take the dependencies you want to use. A simple module might look like this:

var olaBiniSeriousBanking = (function() {
  var balance = 0;

  function deposit(num) {
    balance += num;
  }

  function checkOverdraft(amount) {
    if(balance - amount < 0) {
      throw "Can't withdraw more than exists in account";
    }
  }

  function withdraw(amount) {
    checkOverdraft(amount);
    balance -= amount;
  }

  return {deposit: deposit, withdraw: withdraw};
})();
In this case the balance variable is completely hidden inside a lexical closure, and can only be accessed by the deposit and withdraw functions. These functions are also not in the global namespace so there is no risk for clobbering. It’s also possible to have lots and lots of helper functions that no one else can see. That makes it easier to make your functions smaller – and incidentally, the largest problem I’ve seen with JavaScript code quality is that functions tend to be very large. Don’t do that!
A useful variation of the module pattern is to extract the construction function and give it a name. Even though you might use it immediately, it makes it possible to create more than one of these, use different dependencies, or make it accessible from tests so you can inject collaborators:

var olaBiniGreeterModule = (function(greeting) {
  return {greet: function(name) {
    console.log(greeting + ", " + name);
  }};
});
var olaBiniGreeterEng = olaBiniGreeterModule("Hello");
var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");

RequireJS

The module pattern is good on its own, but there are some things that can be done by a loader that makes things even better. There are several variations of these module loaders, but my favorite so far is RequireJS. I have several reasons for this, but the main one is probably that it is very light weight, and is actually a net win even for very small web applications. There are lots of benefits with letting RequireJS handle your modules. The main ones is that it takes care of dependencies between modules, and loads them automatically. This means you can define one single entry point for your JavaScript, and RequireJS makes sure to load everything else. Another good aspect of RequireJS is that it allows you to avoid any global names at all. Everything is handled by callbacks inside of RequireJS. So how does it look? Well, a simple module with a dependency can look like this:

// in file foo.js
require(["bar", "quux"], function(bar, quux) {
  return {doSomething: function() { 
    return bar.something() + quux.something();
  }};
});
If you have something else that uses foo, then this file will be loaded, bar.js and quux.js will be loaded and the results of loading them (the return value from the module function) will be sent in as arguments to the function that creates the foo module. So RequireJS takes care of all this loading. But how do you kick it off? Well, you should have one single script tag in your HTML, that will point to require.js. You will also add an extra attribute to this script tag that points to the entry point to the JavaScript:

<script data-main="scripts/main" src="scripts/require.js"> </script>
This will do a number of things. It will load require.js. It will set the scripts directory as the base for all module references in your JavaScript. And it will load scripts/main.js as if it’s a RequireJS module. And if you want to use our foo-module earlier, you can create a main.js that looks like this:

// in file main.js
require(["foo"], function(foo) {
  require.ready(function() {
    console.log(foo.doSomething());
  });
});
This will make sure that foo.js and its dependencies bar.js and quux.js will be loaded before the function is invoked. However, one aspect of JavaScript that people sometimes gets wrong is that you have to wait until the DOM is ready to execute JavaScript. With RequireJS we use the ready function inside the require object to make sure we can do something when everything is ready. Your main module should always wait with doing something until the document is ready.
In general, RequireJS has helped a lot with structure and dependencies and it makes it very simple to break up JavaScript into much smaller pieces. I like it a lot. There are a few downsides, though. Main is that it doesn’t interact well with server side JavaScript (or at least it didn’t when I read up on it a month ago). Also, it doesn’t provide a clean way of getting access to the module functions without executing them, which becomes annoying when testing these things. I’ll talk a bit more about that in the section on testing.

No JavaScript in HTML

I don’t want any JavaScript whatsoever in the HTML, if I can avoid it. The only script tag should be the one that starts your module and loading framework – in my case RequireJS. We don’t have any event handlers embedded in the pages at all. We started out from a place where some of our pages had lots of event handlers and refactored to a much smaller code base that was much easier to work with by extracting all of these things into separate JavaScript modules. This has a side effect that anything you want to work with should be possible to semantically identify, either by using CSS classes or data attributes. Try to avoid convoluted paths to find elements. It’s OK to add some extra classes and attributes to make your JavaScript clean and simple.

Init functions on ready

In terms of how we structure modules in a real application, we don’t actually do much work on startup. Instead, most of the work involves setting up event handlers and so on. The way we are doing that is to have the top level modules expose an init method, that is expected to be called by the main module when it starts up. Imagine in a system where you have dojo as the main framework, and you have this code:

// foo.js
require(["bar"], function(bar) {
  function sayHello(node) {
    console.log("hello " + node);
  }

  function attachEventHandlers(dom) {
    dom.query(".fluxCapacitors").onclick(sayHello);
  }

  function init(dom) {
    bar.init(dom);
    attachEventHandlers(dom);
  }

  return {init: init};
});

// main.js
require(["foo"], function(foo) {
  require.ready(function() {
    foo.init(dojo);
  });
});
This will make sure to set up all event handlers and put the application in the right state to be used.

Lots of callbacks

Once you’ve taught yourself to ignore the verbosity of anonymous lambdas in JavaScript, they become very handy tools for creating APIs and helper functions. In general, the code we write use a lot of callbacks and helper wrapper functions. I also use functions that generate new functions quite liberally, doing things like currying and similar aspects. A fairly typical example is something like this:

function checkForChangesOn(node) {
  return function() {
    if(dojo.query(node).length() > 42) {
      console.log("Warning, flux reactor in flax");
    }
  };
}

dojo.query(".clixies").onclick(checkForChangesOn(".fluxes"));
dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
This kind of abstraction can lead to very readable and clean JavaScript if done well. It can also lead to code where very piece is as small as it can be. In fact, one of the ways we use to make the syntax a little bit more bearable is to extract creation of anonymous functions into factory functions like this.

Lots of anonymous objects

Anonymous objects are great for many things. They work as a substitute for named arguments, and can be very useful to return more than one value. In our code base we use anonymous objects a lot, and it definitely helps with code readability.

Testing

We use Jasmine for unit testing our JavaScript. This works quite well in general. Since this is a fairly typical Java web application we wanted to run it as part of our regular build process. This means we ended up using the JUnit Jasmine runner, which allow us to run these tests outside of browsers and format the results using all the available JUnit tools. Since we’ve tried to make the scripts as modular and small as possible, and also extracting most of the DOM behavior, we have avoided using HTML fixtures. This means our tests are leaning more towards traditional unit tests, rather than BDD style tests – which I’m not sure I’m comfortable with. But with the current size of the application, this is not really a problem.
Seeing as we wanted to test each module in isolation, we wanted to be able to instantiate the RequireJS module with our custom mock dependencies. This ended up not being very easy with RequireJS, so instead of trying to fit in to that model, we just don’t load RequireJS at all during testing, but instead have a top-level require function that just saves away the module function with a well defined name. This means we can instantiate the modules as many times as we want and inject different mocks for different purposes.
In general, Jasmine works well for us, but there are some features missing from the mocking/stubbing framework that makes certain things a bit complicated. One thing I miss a lot is the capability of having stubs returning different valueus depending on the arguments sent in. Some ugly code has been written to get around this.

Open questions

Our current JavaScript process works well for us, but there are still some open things we haven’t done yet. First among these is to integrate JSLint into our build process. I really think that should be there, so I have no excuse. We don’t have tests running inside of browsers. I’m actually OK with this, since we’re trying to do more unit level coverage with Jasmine. Hopefully our acceptance tests cover some of the browser based testing. We are not doing minification at all, and we probably won’t need it based on the current expected usage. For a different audience we would certainly minify everything – this is something RequireJS can do really well though. We don’t have any coverage tool running on our JavaScript either. This is something I’m also uncomfortable with, but I haven’t really found a good tool that allows us to run coverage as part of our CI process yet. I also care more about branch coverage than line coverage, and no tool seems to give you this at the moment.

Summary

JavaScript can be completely OK to work with, provided you treat it as a real language. It’s quite powerful, but we also have a lot of bad habits based on hacking together small things, or just doing what works. As we go forward with JavaScript, this needs to stop. But the good news is that if you’re a decent developer, you shouldn’t have any problem picking anything of this up.


Injecting loggers using Spring


On my current project we are using Spring MVC and we try to use autowiring as much as possible. I personally strongly prefer constructor injection, since this gives me the luxury of working with final fields. I also like being able to inject all things a class needs – including loggers. Most of the time I don’t really want to use custom loggers from tests, but sometimes I do want to make sure something gets logged correctly, and being able to inject a logger seems like a natural way of doing that. So, with that preamble out of the way, my problem was that this seemed quite hard to achieve in Spring. Specifically, I use SLF4J, and I want to inject the equivalent of doing LoggerFactory.getLogger(MyBusinessObject.class). Sadly, Spring doesn’t give access to the place where something is going to be injected in any of the hooks available. Most solutions I found to this problem relies on using a BeanPostProcessor to set a field on the object after it’s been created. This defeats three of my purposes/principles – I can’t use the logger in the constructor, the field will be mutable and I won’t get told by Spring if I’ve made a mistake in my wiring.

There was however one solution I found in a StackOverflow post – sadly it wasn’t complete. Specifically, I needed to use it in a Spring MVC setting and also from inside of tests. So this blog post is mainly to provide the complete solution for something like this. It’s a simple problem, but it was surprisingly tricky to get working correctly. But now that I have it, it will be very convenient. This code is for Spring 3.1, and I haven’t tested it on anything else.

The first part of this injection is to create our own custom BeanFactory – which is what Spring uses internally to manage beans and dependencies. The default one is called DefaultListableBeanFactory and we will just subclass it like this:

public class LoggerInjectingListableBeanFactory 
                extends DefaultListableBeanFactory {
    public LoggerInjectingListableBeanFactory() {
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
           new QualifierAnnotationAutowireCandidateResolver());
    }

    public LoggerInjectingListableBeanFactory(
              BeanFactory parentBeanFactory) {
        super(parentBeanFactory);
        setParameterNameDiscoverer(
            new LocalVariableTableParameterNameDiscoverer());
        setAutowireCandidateResolver(
            new QualifierAnnotationAutowireCandidateResolver());
    }

    @Override
    public Object resolveDependency(
               DependencyDescriptor descriptor, String beanName, 
              Set<String> autowiredBeanNames, TypeConverter typeConverter) 
                     throws BeansException {
        Class<?> declaringClass = null;

        if(descriptor.getMethodParameter() != null) {
            declaringClass = descriptor.getMethodParameter()
                    .getDeclaringClass();
        } else if(descriptor.getField() != null) {
            declaringClass = descriptor.getField()
                    .getDeclaringClass();
        }

        if(Logger.class.isAssignableFrom(descriptor.getDependencyType())) {
            return LoggerFactory.getLogger(declaringClass);
        } else {
            return super.resolveDependency(descriptor, beanName, 
                    autowiredBeanNames, typeConverter);
        }
    }
}

The magic happens inside of resolveDependency where we can figure out the declaring class by checking either the method parameter or the field – and then see whether the thing asked for is a Logger. Otherwise we just delegate to the super implementation.

In order to use this from anything we need an actual ApplicationContext that uses it. I didn’t find any hook to set the BeanFactory after the application context was created, so I ended up creating two new ApplicationContext implementations – one for tests and one for the Spring MVC purpose. They are slightly different, but try to do so little as possible while retaining the behavior of the original. The application context for the tests look like this:

public class LoggerInjectingGenericApplicationContext 
                    extends GenericApplicationContext {
    public LoggerInjectingGenericApplicationContext() {
        super(new LoggerInjectingListableBeanFactory());
    }
}

This one just calls the super constructor with an instance of our custom bean factory. The application context for Spring MVC looks like this:

public class LoggerInjectingXmlWebApplicationContext 
                    extends XmlWebApplicationContext {
    @Override
    protected DefaultListableBeanFactory createBeanFactory() {
        return new LoggerInjectingListableBeanFactory(
                    getInternalParentBeanFactory());
    }
}

The XmlWebApplicationContext doesn’t have a constructor that takes a bean factory, so instead we override the createBeanFactory method to return our custom instance. In order to actually use these implementations some more things are needed. In order to get our tests to use it, a test.context.support.ContextLoader implementation is necessary. This code is mostly just copied from the default implementation – sadly it doesn’t provide any extension points and the place I want to override are in the middle of two final methods. It feels quite ugly to just copy the implementations, but there are no hooks for this…

public class LoggerInjectingApplicationContextLoader 
                        extends AbstractContextLoader {
    public final ApplicationContext loadContext(
     MergedContextConfiguration mergedContextConfiguration) 
                                  throws Exception {
        String[] locations = mergedContextConfiguration.getLocations();
        GenericApplicationContext context = 
                  new LoggerInjectingGenericApplicationContext();
        context.getEnvironment().setActiveProfiles(
               mergedContextConfiguration.getActiveProfiles());
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    public final ConfigurableApplicationContext 
            loadContext(String... locations) throws Exception {
        GenericApplicationContext context = 
              new LoggerInjectingGenericApplicationContext();
        loadBeanDefinitions(context, locations);
        AnnotationConfigUtils.registerAnnotationConfigProcessors(context);
        context.refresh();
        context.registerShutdownHook();
        return context;
    }

    protected void loadBeanDefinitions(
            GenericApplicationContext context, String... locations) {
        createBeanDefinitionReader(context).
               loadBeanDefinitions(locations);
    }

    protected BeanDefinitionReader createBeanDefinitionReader(
                      final GenericApplicationContext context) {
        return new XmlBeanDefinitionReader(context);
    }

    @Override
    public String getResourceSuffix() {
        return "-context.xml";
    }
}

The final thing necessary to get your tests to use the custom Bean Factory is to specify the loader to use in the ContextConfiguration on your test class, like this:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(value = "file:our-app-config.xml",
          loader = LoggerInjectingApplicationContextLoader.class)
public class SomeTest {
}

In order to get Spring MVC to pick this up, you can edit your web.xml and add a new init-param for the DispatcherServlet, like this:

    <servlet>
        <servlet-name>Spring MVC Dispatcher Servlet</servlet-name>
        <servlet-class>
           org.springframework.web.servlet.DispatcherServlet
        </servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>WEB-INF/our-app-config.xml</param-value>
        </init-param>
        <init-param>
            <param-name>contextClass</param-name>
            <param-value>
               com.example.LoggerInjectingXmlWebApplicationContext
            </param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

This approach seems to work well enough. Some of the code is slightly ugly and I would definitely love to have a better hook for injection points to know where it will get injected. Having factory methods be able to take the receiver object might be very convenient, for example. Being able to customize the bean factory seems like it also should be much easier than this.



Seph – A Hard Language to Compile


I have recently started work on Seph again. I preannounced it last summer (here), then promply became extremely busy at work. Busy enough that I didn’t really have any energy to work on this project for a while. Sadly, I’m still as busy, but I’ve still managed to find some small slivers of time to start working on the compiler parts of the implementation. This has been made much easier and more fun since JSR292 is getting near to completion, and an ASM 4 branch is available that makes it easier to compile Java bytecode with support for invoke dynamic built in.

So that means that the current code in the repository actually goes a fair bit to where I want it to be. Specifically, the compiler compiles most code except for abstractions that create abstractions, and calls that take keyword arguments. Assignments is not supported either right now. I don’t expect any of these features to be very tricky to implement, so I’m waiting with that and working on other more complicated things.

This blog post is meant to serve two purposes. The first one is to just tell the world that Seph as an idea and project actually is alive and being worked on – and what progress has been made. The other aspect of this post is to talk about some of the things that make Seph a quite tricky language to compile. I will also include some thoughts I have on how to solve these problems – and suggestions are very welcome if you know of a better approach.

To recap, the constraints Seph is working under is that it has to run on Java 7. It has to be fully compiled (in fact, I haven’t decided if I’ll keep the interpreter at all after the compiler is working). And it has to be fast. Ish. I’m aiming for Ruby 1.8-speed at least. I don’t think that’s unreasonable, considering the dimensions of flexibility Seph will have to allow.

So let’s dive in. These are the major pain points right now – and they are in some cases quite interconnected…

Tail recursion

All Seph code has to be tail recursive, which means a tail call should never grow the stack. In order to make this happen on the JVM you need to save information away somewhere about where to continue the call. Then anyone using a value has to check for a tail marker token, and if one that is found, that caller will have to do a repeated call on the current tail until a real value is produced. All the information necessary for the tail will also have to be saved away somewhere.

The approach I’m currently taking is fairly similar to Erjangs. I have a SThread object that all Seph calls will have to pass along – this will act as a thread context as soon as I add light weight threads to Seph. But this place also serves a good place to save away information on where to go next. My current encoding of the tail is simply a MethodHandle that takes no arguments. So the only thing you need to do to pump the tail call is to repeatedly check for the token and call the tail method handle. Still, doing this all over the place might not be that performant. At the moment, the code is not looking up a MethodHandle from scratch in the hot path, but it will have to bind several arguments in order to create the tail method handle. I’m unsure what the performance implications of that will be right now.

Argument evaluation from the callee

One aspect of Seph that works the same as in Ioke is that a method invocation will never evaluate the arguments. The responsibility of evaluating arguments will be in the receiving code, not the calling code. And since we don’t know whether something will do a regular evaluation or do something macro-like, it’s impossible to actually pre-evaluate the arguments and push them on the stack.

The approach Ioke and the Seph interpreter takes is to just send in the Message object and allow the callee to evaluate it. But that’s exactly what I want to avoid with Seph – everything should be possible to compile, and be running hot if that’s possible. So sending Messages around defeats the purpose.

I’ve found an approach to compile this that actually works quite well. It also reduces code bloat in most circumstances. Basically, every piece of code that is part of a message send will be compiled to a separate method. So if you have something like foo(bar baz, qux) that will compile into the main activation method and two argument methods. This approach is recursive, of course. What this gives me is a protocol where I can use method handles to the argument methods, push them on the stack, and then allow the callee to evaluate them however they want. I can provide a standard evaluation path that just calls each of the method handles in turn to generate the values. But it also becomes very easy for me to send them in unevaluated. As an example this is almost exactly what the current implementation of the built in “if” method looks like. (It’s not exactly like this right now, because of transitional interpreter details).

public final static SephObject _if(SThread thread, LexicalScope scope,
        MethodHandle condition, MethodHandle then, MethodHandle _else) {
    SephObject result = (SephObject)condition.invokeExact(thread, scope, 
                                                          true, true);
    
    if(result.isTrue()) {
        if(null != then) {
            return (SephObject)then.invokeExact(thread, scope, 
                                                true, true);
        } else {
            return Runtime.NIL;
        }
    } else {
        if(null != _else) {
            return (SephObject)_else.invokeExact(thread, scope, 
                                                 true, true);
        } else {
            return Runtime.NIL;
        }
    }
}

Of course, this approach is not perfect. It’s still a lot of code bloat, I can’t use the stack to pass things to the argument evaluation, and the code to bind the argument method handles take up most of the generated code at the moment. Still, it seems to work and gives a lot of flexibility. And compiling regular method evaluations will make it possible to bind these argument method handles straight in to an invoke dynamic call site, which could improve the performance substantially when evaluating arguments (something that will probably happen quite often in real world code… =).

Intrinsics are just regular messages

Many of the things that are syntax elements in other languages are just messages in Seph. Things like “nil”, “true”, “false”, “if” and many others work exactly the same way as a regular message send to something you have defined yourself. In many cases this is totally unnecessary though – and in most cases knowing the implementation at the call site allow you to improve things substantially in many cases. I think it’s going to be fairly uncommong to override any of those standard names. But I still want to make it possible to do it. And I’m fine with the programs that do this takng a performance hit from it. So the approach I’ve come up with (but not implemented yet) is this – I will special case the compilation of every place that has the same name as one of the intrinsics. This special casing will bind to a different bootstrap method than regular Seph methods. As a running example, let’s consider compiling a piece of code with “true” in it. This will generate a message send that will be taken care of by a sephTrueBootstrapMethod. We still have to send in all the regular method activation arguments, though. What this bootstrap method will do is to set up a call site that points to a very special method handle. This method handle will be a guardWithTest created through a SwitchPoint specific to the true value. The first path of that GWT (guardWithTest) will just return the true value directly without any checks whatsoever. The else path of the GWT will fallback to a regular Seph fallback method that does inline caching and regular lookup. The magic happens with the SwitchPoint – the places that create new bindings will check for these intrinsic names and if one of those names is used anywhere in the client code, the SwitchPoint will be changed over to the slow path.

In summary, I think a fast path can be possible for many of these things for most programs. The behaviour when you override “if” should still work as expected, but will make the global performance of that program slower for the rest of the execution.

When does lexical scopes escape?

Seph has mutable lexical scopes. But it’s impossible to know which names will escape and which won’t – so as far as I can see, I can’t use the Java stack to represent variables except for in some small amount of very degenerate cases. I’m not sure if it’s worth it to have that code path yet, so I haven’t thought much about it.

Class based PICs aren’t a good fit

One of the standard optimizations that object oriented languages use is something called a polymorphic inline cache. The basic idea is that looking up a method is the really slow operation. So if you can save away the result of doing that, guarded by a very cheap test, then you can streamline the most common cases. Now, that cheap test is usually a check against the class. As long as you send in an instance with the same class, then a new method lookup doesn’t have to happen. Doing a getClass and then a identity equality on that is usually fairly fast (a pointer comparison on most architectures) – so you can builds PICs that don’t actually spend much time in the guard.

But Seph is a prototype based language. So any object in the system can have different methods or values associated with a name, and there is no clear delineation on objects with new names and values in them. Especially, since Seph objects are immutable, every new object will most likely have a new set of values in them. And saving a way objects and dispatching on them becomes much less performant, since the call sites will basically never work on the same object. Now, there are solutions to this – but most of them are tailored for languages where you usually use a class based pattern. V8 uses an approach called hidden classes to figure out things like that. I’m considering implementing something similar, but I’m a bit worried that the usage pattern of Seph will be far enough away from the class based world that it might not work well.

Summary

So, Seph is not terribly easy to compile, and I don’t have a good feeling for how fast it can actually be made. I guess we’ll have to wait and see. But it’s also an interesting challenge, coming up with solutions to these problems. I think I might also have to go on a new research binge, investigating how Self and NewtonScript did things.



Safe(r) monkey patching


Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.

The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.

Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:

class Object
  def presence
    return self if present?
  end
end

But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:

o = Object.new
p o.method(:presence)  #=> #<Method: Object#presence>

However, if I were to implement it using a module instead, like this:

module Presence
  def presence
    return self if present?
  end
end

Object.send :include, Presence

If I look at the method now, the output is a bit changed:

p o.method(:presence)  #=> #<Method: Object(Presence)#presence>

We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.

There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.

Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.

There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:

module X
  def hello
    42
  end
end

class Foo
  include X
end

Foo.new.hello #=> 42

module Y
  def goodbye
    25
  end
end

module X
  include Y
end

Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)

This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.



Comparing times and dates in Ruby


In one of the Rails projects I’m involved with, we do most of the local development against SQLite and then deploy against Oracle. This is a bit annoying for many reasons, but by far the largest cause of trouble is the handling of dates. I haven’t exactly figured out the rules, but for some reason sometimes Oracle returns DateTime in situations where SQLite returns a Date. This usually causes quite subtle problems that have effects in other parts of the application. This brings me to the small piece of advice I wanted to talk about in this column. Always make sure that you know if you are working with a Date, a Time or a DateTime, since these all have slightly different behavior, especially when it comes to comparisons.

The rule is quite simple. If you think you can have a Time object, make sure to turn it into a DateTime object before trying to compare it to a Date object. What happens otherwise? Unfunny things:

Date.today < Time.now #ArgumentError: comparison of Date with Time failed

Time.now > Date.today #true

Time.now == Date.today #false

Date.today == Time.now #nil

Date.today <=> Time.now #nil

Date.today != Time.now #true

The first time I saw some of these results, I was a bit confused. Especially the last three. But they do make a twisted kind of sense. Namely, it’s OK for the <=> operator to return nil if it can’t do a comparison between two objects. And the != in Ruby is hardcoded to return the inverse of the value returned from ==, and the since nil is a falsey value, the inverse of that becomes true.

What I wanted to mention with these things is that you should always make sure you don’t have the Date on the left hand side of a comparison. Or if you want to do a comparison, explicitly call to_date to coerce them. Finally, if you want to do date and time comparisons, I find the best behavior usually comes from coercing both sides with to_datetime before doing the comparison.



Panel on Internet Freedom


Next week, ThoughtWorks and The Churchill Club is organizing a live panel about Internet freedom and the implications of the recent WikiLeaks events. This is going to be a world class event with very engaging speakers, and it will also be streamed live if you can’t attend in person. Daniel Ellsberg, Peter Thiel, Clay Shirky, Jonathan Zittrain and Roy Singham will discuss various important subjects touching on the WikiLeaks controversy:

WikiLeaks: Why it Matters. Why it Doesn’t?
Wednesday, January 19, 2011
5:30-8:30 PM Pacific Standard Time

The purpose of this discussion is to take an objective look at the WikiLeaks controversy and its potential threats to the future of the free Internet. Notwithstanding the varied personal opinions of WikiLeaks and Julian Assange, this issue reaches beyond the actions of one person or one website. Precedents that will determine the very future of the Internet are being set as the world grapples with new social and information models. This is a serious issue worthy of serious discussion and debate.

Paul Jay, CEO and senior editor of The Real News Network will moderate the panel, which includes:

  • Daniel Ellsberg, Former State and Defense Dept. Officialy prosecuted for releasing the Pentagon Papers
  • Clay Shirky, Independent Internet Professional; Adjunct Professor, Interactive Telecommunications Program, New York University
  • Neville Roy Singham, Founder and Chairman, ThoughtWorks
  • Peter Thiel, President, Clarium Capital; Managing Partner, Founder’s Fund
  • Jonathan Zittrain, Professor of Law and Professor of Computer Science, Harvard University; Co-founder, Berkman Center for Internet & Society

Please join us along with other executives from diverse industries and positions, all of whom will gather to listen and engage with these panelists who represent a rich cross section of the communities impacted by the WikiLeaks issue.

Attend in person!

Santa Clara Marriott: 2700 Mission College Boulevard · Santa Clara, California 95054 USA

A buffet dinner will be served.

To register to attend this event please visit the Churchill Club’s Website at : http://www.churchillclub.org/eventDetail.jsp?EVT_ID=892. There is a small charge for this event, however guests of ThoughtWorks may use a special discount, gtworks25, to receive the Churchill Club’s member rate.

View live-streamed event

The Real News Network: http://bit.ly/fdjoMr

Fora.tv: http://bit.ly/f4cuz6



Named Scopes


One of my favorite features of Rails is named scopes. Maybe it’s because I’ve seen so much Rails code with conditions and finders spread all over the code base, but I feel that named scopes should be used almost always where you would find yourself otherwise writing a find_by or even using the conditions argument directly in consumer code. The basic rule I follow is to always use named scopes outside of models to select a specific subset of a model.

So what is a named scope? It’s really just an easier way of creating a custom finder method on your model, but it gives you some extra benefits I’ll talk about later. So if we have code like:

class Foo < ActiveRecord::Base
  def self.find_all_fluxes
    find(:all, :conditions => {:fluxes => true})    
  end
end

p Foo.find_all_fluxes

we can easily replace that with

class Foo < ActiveRecord::Base
  named_scope :find_all_fluxes, :conditions => {:fluxes => :true}
end

p Foo.find_all_fluxes

You can give named_scope any argument that find can take. So you can create named scopes specifically for ordering, specifically for including an association, etc.

The above example is fixed to always have the same conditions. But named_scope can also take arguments. You do that by instead of fixing the arguments, send in a lambda that will return the arguments to use:

def self.ordered_inbetween(from, to)
  find(:all, :conditions => {:order_date => from..to})
end

Foo.ordered_inbetween(10.days.ago, Date.today)

can become:

named_scope :ordered_inbetween, lambda {|from, to|
  {:conditions => {:order_date => from..to}}
}  

Foo.ordered_inbetween(10.days.ago, Date.today)

It’s important that you use the curly braces form of block for the lambda — if you happen to use do-end instead, you might not get the right result, since the block will bind to named_scope. The block that binds to named_scope is used to add extensions to the collections returned, and thus will not contribute to the arguments given to find. It’s a mistake I’ve done several times, and it’s very easy to make. So make sure to test your named scopes too!

So what is it that makes named scopes so useful? Several things. First, they are composable. Second, they establish an explicit interface to the model functionality. Third, you can use them on associations. There are other benefits too, but these are the main ones. Simply put, they are a cleaner solution than the alternatives.

What do I mean by composable? Well, you can call one on the result of calling another. So say that we have these named scopes:

class Person < ActiveRecord::Base
  named_scope :by_name, :order => "name ASC"
  named_scope :by_age, :order => "age DESC"
  named_scope :top_ten, :limit => 10
  named_scope :from, lambda {|country|
    {:conditions => {:country => country}}
  }
end

Then you can say:

Person.top_ten.by_name
Person.top_ten.from("Germany")
Person.top_ten.by_age.from("Germany")

I dare you to do that in a clean way by using class method finders.

I hope you have already seen what I mean by offering a clean interface. You can hide some of the implementation details inside your model, and your controller and view code will read more cleanly because of it.

The third point I made was about associations. Simply, if you have an association, you can use a named scope on that association, just as if it was the model class itself. So if we have:

class ApartmentBuilding < ActiveRecord::Base
  has_many :tenants, :class_name => "Person"
end

a = ApartmentBuilding.first
a.tenants.from("Sweden").by_age

So named scopes are great, and you should use them. Whenever you sit down to write a finder, see if you can’t express it as a named scope instead.

(Note: some of the things I write about here only concerns Rails 2. Most of the work I do is still in the old world of 2.3.)