Turn and face the strange


Hi all,

So it’s a new year. A lot has changed, and a lot is staying the same. So I was thinking that maybe it’s time to start writing again. Maybe. Let’s see how it goes.

First things first, I have put some different subjects on different blogs before. I might continue doing that for some of them – but I’ll also use this main blog to cover more different subjects. If you come here for programming language geekery, there might not be that much of those things anymore. But we will see.

Last year did come with some significant changes. I’ve left ThoughtWorks, and am now working in an NGO form on privacy, security, crypto and anonymity things. Once we have a public web site and so on, I might tell more of the story here. Suffice to say, leaving ThoughtWorks was very hard, but also the right thing to do, in order to be able to expand the impact I can have, working on the things that I find important to the world.

That’s really most of it. My mind is very much focused on privacy these days. Some days I’m heads down writing low level code, others I’m spending on specification of cryptographic protocols, or the usability of common types of security interactions. This world is full of horrible things, and we need a change.

I have also basically stopped giving talks at conferences. For now, that is probably going to continue – I think most of what I have to say isn’t necessarily so relevant to conferences like Goto or QCon anymore – and I find that finding time to write software is hard enough as it is, even when not competing against conference travel.

So I think I’ll leave it at that for tonight. My life goes on in interesting directions. Code is as much a part of my daily life as it has ever been. Only the focus has changed a bit over the last ten years.

In my next post I was thinking about talking about one of the projects me and my team have spent a lot of time on the last year or so. Until then!



JavaScript in the small


My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.

Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.

I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.

The module pattern

The basic idea of the module pattern is that you encapsulate all your code in anonymous functions that are then immediately evaluated to generate the actual top level object. Since JavaScript has some unfortunate problems with global variables (like, they are there), it’s safest to just put all your code inside of one or more of these modules. You can also make your modules take the dependencies you want to use. A simple module might look like this:

var olaBiniSeriousBanking = (function() {
  var balance = 0;

  function deposit(num) {
    balance += num;
  }

  function checkOverdraft(amount) {
    if(balance - amount < 0) {
      throw "Can't withdraw more than exists in account";
    }
  }

  function withdraw(amount) {
    checkOverdraft(amount);
    balance -= amount;
  }

  return {deposit: deposit, withdraw: withdraw};
})();
In this case the balance variable is completely hidden inside a lexical closure, and can only be accessed by the deposit and withdraw functions. These functions are also not in the global namespace so there is no risk for clobbering. It’s also possible to have lots and lots of helper functions that no one else can see. That makes it easier to make your functions smaller – and incidentally, the largest problem I’ve seen with JavaScript code quality is that functions tend to be very large. Don’t do that!
A useful variation of the module pattern is to extract the construction function and give it a name. Even though you might use it immediately, it makes it possible to create more than one of these, use different dependencies, or make it accessible from tests so you can inject collaborators:

var olaBiniGreeterModule = (function(greeting) {
  return {greet: function(name) {
    console.log(greeting + ", " + name);
  }};
});
var olaBiniGreeterEng = olaBiniGreeterModule("Hello");
var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");

RequireJS

The module pattern is good on its own, but there are some things that can be done by a loader that makes things even better. There are several variations of these module loaders, but my favorite so far is RequireJS. I have several reasons for this, but the main one is probably that it is very light weight, and is actually a net win even for very small web applications. There are lots of benefits with letting RequireJS handle your modules. The main ones is that it takes care of dependencies between modules, and loads them automatically. This means you can define one single entry point for your JavaScript, and RequireJS makes sure to load everything else. Another good aspect of RequireJS is that it allows you to avoid any global names at all. Everything is handled by callbacks inside of RequireJS. So how does it look? Well, a simple module with a dependency can look like this:

// in file foo.js
require(["bar", "quux"], function(bar, quux) {
  return {doSomething: function() { 
    return bar.something() + quux.something();
  }};
});
If you have something else that uses foo, then this file will be loaded, bar.js and quux.js will be loaded and the results of loading them (the return value from the module function) will be sent in as arguments to the function that creates the foo module. So RequireJS takes care of all this loading. But how do you kick it off? Well, you should have one single script tag in your HTML, that will point to require.js. You will also add an extra attribute to this script tag that points to the entry point to the JavaScript:

<script data-main="scripts/main" src="scripts/require.js"> </script>
This will do a number of things. It will load require.js. It will set the scripts directory as the base for all module references in your JavaScript. And it will load scripts/main.js as if it’s a RequireJS module. And if you want to use our foo-module earlier, you can create a main.js that looks like this:

// in file main.js
require(["foo"], function(foo) {
  require.ready(function() {
    console.log(foo.doSomething());
  });
});
This will make sure that foo.js and its dependencies bar.js and quux.js will be loaded before the function is invoked. However, one aspect of JavaScript that people sometimes gets wrong is that you have to wait until the DOM is ready to execute JavaScript. With RequireJS we use the ready function inside the require object to make sure we can do something when everything is ready. Your main module should always wait with doing something until the document is ready.
In general, RequireJS has helped a lot with structure and dependencies and it makes it very simple to break up JavaScript into much smaller pieces. I like it a lot. There are a few downsides, though. Main is that it doesn’t interact well with server side JavaScript (or at least it didn’t when I read up on it a month ago). Also, it doesn’t provide a clean way of getting access to the module functions without executing them, which becomes annoying when testing these things. I’ll talk a bit more about that in the section on testing.

No JavaScript in HTML

I don’t want any JavaScript whatsoever in the HTML, if I can avoid it. The only script tag should be the one that starts your module and loading framework – in my case RequireJS. We don’t have any event handlers embedded in the pages at all. We started out from a place where some of our pages had lots of event handlers and refactored to a much smaller code base that was much easier to work with by extracting all of these things into separate JavaScript modules. This has a side effect that anything you want to work with should be possible to semantically identify, either by using CSS classes or data attributes. Try to avoid convoluted paths to find elements. It’s OK to add some extra classes and attributes to make your JavaScript clean and simple.

Init functions on ready

In terms of how we structure modules in a real application, we don’t actually do much work on startup. Instead, most of the work involves setting up event handlers and so on. The way we are doing that is to have the top level modules expose an init method, that is expected to be called by the main module when it starts up. Imagine in a system where you have dojo as the main framework, and you have this code:

// foo.js
require(["bar"], function(bar) {
  function sayHello(node) {
    console.log("hello " + node);
  }

  function attachEventHandlers(dom) {
    dom.query(".fluxCapacitors").onclick(sayHello);
  }

  function init(dom) {
    bar.init(dom);
    attachEventHandlers(dom);
  }

  return {init: init};
});

// main.js
require(["foo"], function(foo) {
  require.ready(function() {
    foo.init(dojo);
  });
});
This will make sure to set up all event handlers and put the application in the right state to be used.

Lots of callbacks

Once you’ve taught yourself to ignore the verbosity of anonymous lambdas in JavaScript, they become very handy tools for creating APIs and helper functions. In general, the code we write use a lot of callbacks and helper wrapper functions. I also use functions that generate new functions quite liberally, doing things like currying and similar aspects. A fairly typical example is something like this:

function checkForChangesOn(node) {
  return function() {
    if(dojo.query(node).length() > 42) {
      console.log("Warning, flux reactor in flax");
    }
  };
}

dojo.query(".clixies").onclick(checkForChangesOn(".fluxes"));
dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
This kind of abstraction can lead to very readable and clean JavaScript if done well. It can also lead to code where very piece is as small as it can be. In fact, one of the ways we use to make the syntax a little bit more bearable is to extract creation of anonymous functions into factory functions like this.

Lots of anonymous objects

Anonymous objects are great for many things. They work as a substitute for named arguments, and can be very useful to return more than one value. In our code base we use anonymous objects a lot, and it definitely helps with code readability.

Testing

We use Jasmine for unit testing our JavaScript. This works quite well in general. Since this is a fairly typical Java web application we wanted to run it as part of our regular build process. This means we ended up using the JUnit Jasmine runner, which allow us to run these tests outside of browsers and format the results using all the available JUnit tools. Since we’ve tried to make the scripts as modular and small as possible, and also extracting most of the DOM behavior, we have avoided using HTML fixtures. This means our tests are leaning more towards traditional unit tests, rather than BDD style tests – which I’m not sure I’m comfortable with. But with the current size of the application, this is not really a problem.
Seeing as we wanted to test each module in isolation, we wanted to be able to instantiate the RequireJS module with our custom mock dependencies. This ended up not being very easy with RequireJS, so instead of trying to fit in to that model, we just don’t load RequireJS at all during testing, but instead have a top-level require function that just saves away the module function with a well defined name. This means we can instantiate the modules as many times as we want and inject different mocks for different purposes.
In general, Jasmine works well for us, but there are some features missing from the mocking/stubbing framework that makes certain things a bit complicated. One thing I miss a lot is the capability of having stubs returning different valueus depending on the arguments sent in. Some ugly code has been written to get around this.

Open questions

Our current JavaScript process works well for us, but there are still some open things we haven’t done yet. First among these is to integrate JSLint into our build process. I really think that should be there, so I have no excuse. We don’t have tests running inside of browsers. I’m actually OK with this, since we’re trying to do more unit level coverage with Jasmine. Hopefully our acceptance tests cover some of the browser based testing. We are not doing minification at all, and we probably won’t need it based on the current expected usage. For a different audience we would certainly minify everything – this is something RequireJS can do really well though. We don’t have any coverage tool running on our JavaScript either. This is something I’m also uncomfortable with, but I haven’t really found a good tool that allows us to run coverage as part of our CI process yet. I also care more about branch coverage than line coverage, and no tool seems to give you this at the moment.

Summary

JavaScript can be completely OK to work with, provided you treat it as a real language. It’s quite powerful, but we also have a lot of bad habits based on hacking together small things, or just doing what works. As we go forward with JavaScript, this needs to stop. But the good news is that if you’re a decent developer, you shouldn’t have any problem picking anything of this up.


RSpec matchers and regexp comments – a possibly useful hack


A few days back, I was sitting at my client and working on a hacky spec to validate some assumptions in a very dirty data set. I wanted to figure out some limits. The basic idea was that I needed to go through an entry for every day the last 8 years. Getting these entries is potentially expensive, and the validation was based on checking that a specific value never turns up. This was quite easy, and I ended up with something like this:

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not == "MAGIC TOKEN"
    end
  end
end

Here you can see that I just simply use a method to calculate the invariant, then use the “should_not ==” to find out if it’s true. Nothing fancy. The problem comes when I want to get information about a failure. Now, I could insert a print statement. That means I’d have to look at all the output until I get to the end, to see which one failed. I could also rescue all exceptions, print the offending information and then reraise. But the best solution would be to give RSpec a failure message. Now, you can definitely do this for RSpec in other matchers, but I couldn’t find a way of doing it with the == matcher. One thing I could have done, was to just write my own matcher.That also seemed inefficient. This was a throw away thing, run once and then delete.

What I ended up doing was actually quite elegant, in a very disgusting way. It works, and it might be useful for someone else, sometime. But don’t EVER do anything like this in code you will save.

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not =~ /\AMAGIC TOKEN(?#Invariant failed on: #{date})\Z/
    end
  end
end

So why does this work? Well, it turns out that you can have comments in regular expressions. And you can interpolate arbitrary values into regexps, just like with strings. So I can embed the failure information in a comment in the regexp. This will only be displayed when the match fails, since RSpec by default says something like “expected MAGIC TOKEN to not match /\AMAGIC TOKEN(?#Invariant failed on: 2010-06-04)\Z/”, so you get the information necessary. The comment does not contribute to the matching in anyway. There’s another subtle point here. I haven’t used ^ and $ for anchoring the pattern. Instead I use \A and \Z. The reason is that otherwise, my regexp wouldn’t have the same behavior as comparing against a string, since ^ and $ match the beginning and end of lines too, not only the beginning and end of buffer.

Anyway, I thought I’d share this. In basically all cases, don’t do this. But it’s still a bit funny.



Clojure


I know I’ve mentioned Clojure now and again in this blog, but I haven’t actually talked that much about it. I feel it’s time to change that right now – Clojure is in the air and it’s looking really interesting. More and more people are talking about it, and after the great presentation Rich gave at the JVM language summit I feel that there might be some more converts in the world.

So what is it? Well, a new Lisp dialect for the JVM. It was originally targeting both the JVM and .NET but Rich ended up not going through with that (a decision I can understand after seeing the efforts Fan have to expend to continue providing this feature).

It’s specifically not an implementation of either Common Lisp nor Scheme, but instead a totally new language that’s got some interesting features. The most striking feature of it is the way it embraces functional programming. In comparison to Common Lisp who I characterize as being a multiparadigm language, Clojure has a heavy bent towards functional programming. This includes a focus on immutable data structures and support for good concurrency models. He’s even got an implementation of STM in there, which is really cool.

So what do I think about it? First of all, it’s definitely a very interesting language. It’s also taken the ideas of Lisp and twisting them a bit, adding some new ideas and refining some old ones. If I wanted to do concurrency programming for the JVM I would probably lean more towards Clojure than Scala, for example.

All that said, I am in two minds about the language. It is definitely extremely cool and it looks very useful. The libraries specifically have lots to say for them. But the other side of it for me is from the point of Lisp purity. One of the things I really like about Lisps is that they are very simple. The syntax is extremely small and in most cases everything will just be either lists or atoms and nothing else. Common Lisp can handle other syntax with reader macros – which end up with results that are still only lists and atoms. This is extremely powerful. Clojure has this to a degree, but adds several basic composite data structures that are not lists, such as sets, arrays and maps. From a pragmatic standpoint I can understand that, but the fact that they are basic syntax instead of reader macros mean that if I want to process Clojure code I will end up having to work with several kinds of composite data structures instead of just one.

This might seem like a small thing, and it’s definitely not something that would stop me from using the language. But the Lisp lover in me cringes a bit at this decision.

All in all Clojure is really cool and I recommend people to take a look at it. It’s getting lots of attention and people are writing about it. Stu Halloway is currently in the process of porting Practical Common Lisp to Clojure, and I recently saw a blog post about someone porting On Lisp to Clojure, so there is absolutely an interest in it. The question is how this will continue. As I’ve started saying more and more: these are interesting times for language geeks.



Applications and libraries


In a recent discussion around one of Steve Yegge’s blog post, an incidental remark was that it’s OK that a language makes it harder for a library creator than for an application developer. This point was made by David Pollak and Martin Odersky in relation to some of the complications that you need to handle when creating a Scala library that you can intuitively use without a full understanding of the Scala type system. Make no mistake, I have lots of respect for both Martin and David, it’s just that in this case I think it’s actually a quite damaging assumption to make. And they are not the only ones who reason like that either. Joshua Bloch’s book Effective Java includes this assumption too, in many places.

So what’s wrong with it then? Isn’t there a difference between developing an application and a library. Yes, there is a difference, but it’s definitely not as large as people make it out to be. And even more importantly: it _shouldn’t_ be that much of a difference. The argument from David was that when creating a library in Scala, he needs to focus and work with quite complicated parts of the type system so that the consumer gets a nice API to use the library through. This process is much harder than just using the library would be.

Effective Java contains much good advice, but most of them are from the perspective of someone who creates libraries for a living, and there are a few places where Josh explicitly says that his advice isn’t necessarily applicable when writing an application, since he doesn’t have that point of view.

Let’s take a look at a fundamental question then. What is actually a library, and what is an application? In my opinion, a library is a module providing functionality of some kind, restricted to a specific domain. This can be a horizontal or vertical domain, that doesn’t matter, but it’s usually something that is usable in more than one circumstance. It’s not uncommon that libraries use other libraries to implements its functionality. An application is usually a collection of libraries that provide functionality to an end user. That end user can be either a person, a program or another computer – that doesn’t matter. But wait, isn’t libraries usually also created to provide functionality to other pieces of code? And even though libraries have a tendency to contain more specific code, and less usage of other libraries, the line is extremely fuzzy.

The way most applications seems to be built now, most of the work is done to collect libraries, provide the missing functionality and glue them together in some way. But that doesn’t mean that the code you write in the application won’t be used as a library by another consumer. In fact, it’s more and more common to try to reuse as much as possible, and especially when you extend an existing application, it’s extremely important that you can consume the existing functionality in a sane way.

So why make the distinction? Doing that seems to me to be an excuse for writing bad code if it’s in an application. Why won’t we as programmers admit that we don’t know if someone else will need to consume the code later, and write the best code we can, including creating usable ad well thought out public APIs? Yes, the cost and time will be higher, but that’s true for writing tests too. I don’t see any value in arguing that libraries should be designed with more care than application code. In fact, I think that attitude is actively detrimental to the industry. And adding a language feature to a language that is complicated, and then arguing that only “library developers” will need to understand it is definitely not the right way to go. A responsible developer using a language needs to understand how that language works. Otherwise that developer will sooner or later cause a great mess. It’s just a matter of time.



Meta-level thinking


I have been trying to figure out what characterizes some of the best programmers I know, or know about. It’s a frustrating endeavor of course, since most of these people are very different from each other, and have different experiences and ways to think and learn about stuff. Not to mention the fact that programmers tend to be highly individualistic.

But I think that I’m finally zeroing in on something that is general enough but also specific enough to categorize most of these people, and the general mind needed to be a really good programmer. In this blog post I’ll call that “meta-level thinking”, and I’ll explain more about what I mean as we go along.

When people try to become better programmers there are generally a few different kinds of advices you can get. The ones that seem prevalent right now is (among others):

  • Learn – and understand – Lisp
  • More generally, learn a new language that is sufficiently different from your current knowledge
  • Work with domain specific languages
  • Understand metaprogramming
  • Read up on the available data structures (Knuth anyone)?
  • Implement a compiler and runtime system for a language

Of course, these are only a few examples, and the categories are a bit fuzzy. These items are based on most of my experiences so they tend to be a bit language heavy. But I believe that what you practice by choosing any of these routes is an ability to abstract your thinking. In short, I believe that being able to abstract and understand what goes on in a programming language is one way to become more proficient in that language, but not only that – by changing your thinking to see this part of your environment you generally end up programming differently in all languages.

This is why Lisp has a tendency to change the way people write Java code. Lisp teaches you metaprogramming and DSL’s but they don’t really do it in a way that need words. DSLs and metaprogramming are just part of Lisp. It’s so much a part of the structure that you don’t see it. But when you turn to Java you’ll need to start working with these abstractions on a different level. Ruby programmers embrace metaprogramming and this changes the way these Ruby programmers think about things.

I’m really happy to see metaprogramming and DSLs getting more and more focus, because I really believe that understanding them is a really good way to make programmers better. Of course, you can get the same effect by writing a copmiler and runtime system – as Steve Yegge proposes. But I disagree that you really need that experience. There are other ways you can get the same way of looking at the world.

I call this meta-level thinking. I think it’s mostly a learned ability, but also that there is an aptitude component. Some of the best programmers I’ve met just have a mind that fits very well. It’s interesting to note that this kind of meta-level thinking is not an ability that is only applicable to programming. In fact, that’s probably just a matter of genetic luck that the same ability works very well for programming as for many other things. I think that there is a connection between certain abilities and a capacity for meta-level thinking too. Like music – it’s interesting to see how many programmers that have artistic leanings, and specifically towards music. It would be interesting to see some statistics about this.

In conclusion, this is my somewhat fuzzy view about what is one of the most important abilities that contributes to create brilliant programmers. If you have any interesting ideas about other ways you can reliably practice this ability, please improve on my list.