Keywords in languages


It’s nice to see how the amount of people looking into Scala has really exploded lately, based on the rush of blog posts an discussion.

One of the things I find a bit annoying about Scala is the proliferation of keywords. Actually, this is something I really don’t like in any languages. A language should be as keywordless as possible. Of course, such a vision goes against ease of implementation for language implementers, so there always needs to be a balance here. Coming from languages such as Lisp and Io, it’s amazing how clear a language can be with a well chosen message passing or invocation model. In fact, both of those languages have zero keywords. That makes it incredibly nice to implement whatever you want.

Actually, Java has been quite good at not taking much more keywords than they had from the beginning, so I found it a bit annoying when I tried to build a fluent interface in Scala, and found out that the word “with” is a keyword. And it’s a keyword in the strictest sense, meaning you can’t use it in places where you can’t use the “with” keyword anyway. So there is no way to implement a method named “with” in Scala. Annoying. It’s just, the English connection words are so much more useful for method names, especially when you can use the methods in “operator” position. Then you just want to be able to use all these words.

So. If you design a language, make sure that you take care to actually add every keyword extremely carefully. If you can, make sure that keywords can actually be used for some things when there is no ambiguity. Of course, I’m not proposing the kind of madness you can do in some languages, where a statement such as “IF IF THEN THEN” is valid, since the first IF is a keyword, the next is a variable name, etc. But be reasonable about keywords. They are sometimes necessary, but not always as often as people believe.



Scala testing with specs


So. The story about unit testing is over for now. I will use specs. Eric made an incredible job and got the JUnit support working with JUnit 4 too very quickly. Today I reintegrated it, and everything works fine.

So, if you want working Ant integration for your Scala testing, I recommend specs.

Of course, this still doesn’t explain while all the other alternatives failed so miserably. Hopefully there will be a bit more testing in the community soon. Testing frameworks need competition to evolve well.

It’s funny. One of the points to my original post about Scala unit testing was this quote: “Now some lovely Ruby people are looking at Scala, and the very first thing they must do (of course) is write the sacred unit tests:”.

I’m not sure about you, but I would say that that’s a good statement about Ruby people in general, if that’s the way people view us. =)



Sweden visit


I will be coming to Sweden on Sunday, and I will stay a full 4 weeks. So if anyone feels like grabbing a beer and talking about anything remotely geeky, I’m up for it. Ping me or leave a comment on this post. I’ll mostly be in Stockholm, but some time will be spent in Gothenburg too.



Language explorations


I blogged about looking at languages a while back. At that point I didn’t know what my next language to explore would be. I got lots of excellent suggestions. In the end I decided to try OCaML, but gave that up quickly when I found out that half of the type system exists to cover up deficiencies in the other half of it. So I went back and decided to learn Scala. I haven’t really had time to start with it though. Until now, that is.

So let’s get back to the motivation here? Why do I want to learn another language? Aren’t I happy with Ruby? Well, yes and no. But that’s not really the point. You can always point to the Prags one-language-a-year, but that’s not it either. I mean, it’s really good advice, but there is a more urgent reason for me to learn Scala.

I know many people have said this before, but it bears repeating. Everyone doesn’t share this opinion, but have a firm belief that the end of big languages is very close. There won’t be a next big language. There might be some that are more popular than others, but the way development will happen will be much more divided into using different languages in the same project, where the different languages are suited for different things. This is the whole Polyglot idea. And my take on it is this: the JVM is the best platform there is for Polyglot platform, and I think we will see three language layers emerge in larger applications. Now, the languages won’t necessarily be built on top of each other, but they will all run on the JVM.

The first layer is what I called the stable layer. It’s not a very large part of the application in terms of functionality. But it’s the part that everything else builds on top off, and is as such a very important part of it. This layer is the layer where static type safety will really help. Currently, Java is really the only choice for this layer. More about that later, though.

The second layer is the dynamic layer. This is where maybe half the application code resides. The language types here are predominantly dynamic, strongly typed languages running on the JVM, like JRuby, Rhino and Jython. This is also the layer where I have spent most of my time lately, with JRuby and so on. It’s a nice and productive place to be, and obviously, with my fascination for JVM languages, I believe that it’s the interplay between this layer and the stable layer that is really powerful.

The third layer is the domain layer. It should be implemented in DSL’s, one or many depending on the needs of the system. In most cases it’s probably enough to implement it as an internal DSL within the dynamic layer, and in those cases the second and third layer are not as easily distinguishable. But in some cases it’s warranted to have an external DSL that can be interacted with. A typical example might be something like a rules engine (like Drools).

I think I realized a long time ago that Java is not a good enough language to implement applications. So I came up with the idea that a dynamic language on top of Java might be enough. But I’m starting to see that Java is not good enough for the stable layer either. In fact, I’m not sure if Java the language is good enough for anything, anymore. So that’s what my language exploration is about. I have a suspicion that Scala might be a good language at the stable layer, but at this point the problem is there aren’t any other potential languages for that layer. So what I’m doing is trying to investigate if Scala is good enough for that.

But I need to make one thing clear – I don’t believe there will be a winner at any of these layers. In fact, I think it would be a clearly bad thing if any one language won at any layer. That means, I’m seeing a future where we have Jython and JRuby and Rhino and several other languages coexisting at the same layer. There doesn’t need to be any rivalry or language wars. Similarly, I see even less point in Scala and Ruby being viewed as competing. In my point of view they aren’t even on the same continent. And even if they were, I see no point in competing.

I got accused of being “religious” about languages yesterday. That was an interesting way of putting it, since I have always been incredibly motivated to see lots of languages coexisting, but coexisting on the JVM in a productive way.



Does established tools matter or – Is Ant support important?


This post is a bit of a follow up to my rant about Unit testing in Scala two days back. First let me tell you that that story actually has a happy ending. Eric Torreborne (creator of the Specs framework) immediately stepped up and helped me, and the problem seemed to be a lack of support for JUnit4 test running, which he subsequently implemented. I’m going to reintegrate
specs into my test suite later today. So that makes me happy. I might still retain JtestR. Actually, it would be a bit interesting to see the differences in writing tests in Ruby and Scala.

I spent some time on #scala on FreeNode yesterday. Overall it was an interesting experience. We ended up talking a bit about the unit testing bit, and Ant integration in particular. I’ll get back to that conversation later. But this sparked in me the question why I felt that it was really important to have Ant integration. Does it actually matter?

I have kind of assumed that for a tool running on the JVM, that people might need during their build process, integration with Ant and Maven is more or less a must. It doesn’t really matter what I think of these tools. If I want anyone to actually use the tool in question, this needs to work. In many cases that means it’s not enough to just have a Java class that can be called with the Java-task in Ant. The integration part is a quite small step, but important enough. Or at least I think it is.

I am thinking that no matter what you think of Ant, it’s one of those established tools that are here to stay for a long time. I know that I reach for Ant by default when I need something built. I know there are technologically better choices out there. Being a Ruby person, I know that Rake is totally superior, and that there is Raven and Buildr who both provide lots of support for building my project with Ruby. So why do I still reach for Ant whenever I start a new project?

I guess one of the reasons is that I’m almost always building open source projects. I want people to use my stuff, to do things with it. That means the barrier to entry need to be as low as possible. For Java, the de facto standard build system is still Ant, so the chance of people having it installed is good. Ant is easy enough to work with. Sure, it can be painful for larger things, but for smaller projects there really is no problem with Ant.

What do you think? Does it matter if you use established tools, that might be technologically inferior? Or should you always go for the best solution?

Sometimes I’m considering using other tools for build my own personal projects, but I never know enough to say with certainty I will never release it. That means I have two choices – either I use something else first and then convert it if I release it, or I just go with Ant directly.

Now, heading back to that conversation. It started with a comment about “an ant task for failing unit tests is severely overrated”. Then it went rapidly downhill to “Haskell shits all over Ant for a build script for example”, at which point I totally tuned out. Haskell might be extremely well suited for build scripts, but it’s not an established tool that anyone can use from their tool chain. And further, I got two examples of how these build scripts would look, later in the conversation, and both of them were barely better than shell scripts for building and testing. Now there is a very good reason people aren’t using shell scripts for their standard building and testing tool chain.

Or am I being totally unreasonable about this? (Note, I haven’t in any way defended the technological superiority of Ant in this post, just to make it clear.)



Antlr lexing problem


I should probably post this on a mailing list instead, but for now I want to document my problem here. If anyone has any good suggestions I’d appreciate it.

I’m using Antlr to lex a language. The language is fixed and has some cumbersome features. One in particular is being really annoying and giving me some trouble to handle neatly with Antlr 3.

This problem is about sorting out Identifiers. Now, to make things really, really simple, an identifier can consist of the letter “s” and the character “:” in any order, in any quantity. An identifier can also be the three operators “=”, “:=” and “::=”. That is the whole language. It’s really easy to handle with whitespace separation and so on. But these are the requirements that give me trouble. The first three are simple baseline examples:

  • “s” should lex into “s”
  • “s:” should lex into “s:”
  • “s::::” should lex into “s::::”
  • “s:=” should lex into “s” and “:=”
  • “s::=” should lex into “s:” and “:=”
  • etc.

Now, the problem is obviously that any sane way of lexing this will end up eating the last colon too. I can of course use a semantic predicate to make sure this isn’t allowed when the last character is a colon and the next is “=”. This helps for the 4th case, but not for the 5th.

Anyone care to help? =)



Scala unit testing


I wish this could be a happy story. But it’s really not. If I have made any factual errors in this little rant, don’t hesitate to correct me – I would love to be proved wrong about this.

Actually, I wrote this introduction before I went out to celebrate New Years. Now I’m back to finish the story, and the picture have changed a bit. Not enough yet, but we’ll see.

Let’s tell it from the start. I have this project I’ve just started working on. It seemed like a fun and quite large thing that I can tinker on in my own time. It also seemed like a perfect match to implement in Scala. I haven’t done anything real in Scala yet, and wanted to have a chance to do it. I like everything I’ve seen about the language itself. I’ve said so before and I’ll say it again. So I decided to use it.

As you all probably know, the first step in a new project is to set up your basic structure and getting all the simple stuff working together. Right, for me that means a simple Ant script that can compile Java and Scala, package it into a jar file, and run unit tests on the code. This was simple. … Well, except for the testing bit, that is.

It seems there are a few options for testing in Scala. The ones I found was SUnit (included in the Scala distribution), ScUnit, Rehersal and specs (which is based on ScalaCheck, another framework). So these are our contestants.

First, take SUnit — this is a very small project, no real support for anything spectacular. The syntax kinda stinks. One class for each test? No way. Also, no integration with Ant. I haven’t even tried to run this. Testing should be painless, and in this case I feel that using Java would have been an improvement.

ScUnit looked really, really promising. Quite nice syntax, lots of smarts in the framework. I liked what the documentation showed me. It had a custom Ant task and so on. Very nice. It even worked for a simple Hello, World test case. I thought that this was it. So I started writing the starting points for the first test. For some reasons I needed 20 random numbers for this test. Scala has so many ways of achieving this… I think I’ve tried almost all of them. But all failed with this nice class loading exception just saying InstantiationException on an anonymous function. Lovely. Through some trial and error, I found out that basically any syntax that causes Scala to generate extra classes, ScUnit will fail to run. I have no idea why.

So I gave up and started on the next framework. Rehersal. I have no idea what the misspelling is about. Anyway, this was a no show quite quickly since the Ant test task didn’t even load (it referenced scala.CaseClass, which doesn’t seem to be in the distribution anymore). Well then.

Finally I found specs and ScalaCheck. Now, these frameworks look mighty good, but they need better Google numbers. Specs also has the problem of being the plural of a quite common word. Not a good recipe for success. So I tried to get it working. Specs is built on top of ScalaCheck, and I much preferred the specs way of doing things (being an RSpec fan boy and all). Now specs doesn’t have Ant integration at all, but it does have a JUnit compatibility layer. So I followed the documentation exactly and tried to run it with the Ant JUnit task. KABOOM. “No runnable methods”. This is an error message from JUnit4. But as far as I know, I have been able to run JUnit3 classes as good as JUnit4 classes with the same classpath. Hell, JRuby uses JUnit3 syntax. So obviously I have JUnit4 somewhere on my classpath. For the world of me I cannot find it though.

It doesn’t really matter. At that point I had spent several hours getting simple unit testing working. I gave up and integrated JtestR. Lovely. Half of my project will now not be Scala. I imagine I would have learned more Scala by writing tests in it, than with the implementation. Apparently not. JtestR took less than a minute to get up and working.

I am not saying anything about Scala the language here. What I am saying is that things like this need to work. The integration points need to be there, especially with Ant. Testing is the most important thing a software developer does. I mean, seriously, no matter what code you write, how do you know it works correctly unless you test it in a repeatable way? It’s the only responsible way of coding.

I’m not saying I’m the worlds best coder in any way. I know the Java and Ruby worlds quite well, and I’ve seen lots of other stuff. But the fact that I can’t get any sane testing framework in Scala up and running in several hours, with Ant, tells me that the Scala ecosystem might not be ready for some time.

Now, we’ll see what happens with specs. If I get it working I’ll use it to test my code. I would love that to happen. I would love to help make it happen – except I haven’t learned enough Scala to actually do it yet. Any way or another, I’m not giving up on using Scala for this project. I will see where this leads me. And you can probably expect a series of these first impression posts from me about Scala, since I have a tendency to rant or rave about my experiences.

Happy New Years, people!



JtestR 0.1 released


If people have wondered, this is what I have been working on in my spare time the last few weeks. But now it’s finally released! The first version of JtestR.

So what is it? A library that allows you to easily test your Java code with Ruby libraries.

Homepage: http://jtestr.codehaus.org
Download: http://dist.codehaus.org/jtestr

JtestR 0.1 is the first public release of the JtestR testing tool. JtestR integrates JRuby with several Ruby frameworks to allow painless testing of Java code, using RSpec, Test/Unit, dust and Mocha.

Features:

  • Integrates with Ant and Maven
  • Includes JRuby 1.1, Test/Unit, RSpec, dust, Mocha and ActiveSupport
  • Customizes Mocha so that mocking of any Java class is possible
  • Background testing server for quick startup of tests
  • Automatically runs your JUnit codebase as part of the build

Getting started: http://jtestr.codehaus.org/Getting+Started

Team:
Ola Bini – ola.bini@gmail.com
Anda Abramovici – anda.abramovici@gmail.com



Code size and dynamic languages


I’ve had a fun time the last week noting the reactions to Steve Yegge’s latest post (Code’s Worst Enemy). Now, Yegge always manages to write stuff that generate interesting – and in some cases insane – comments. This time, the results are actually quite a bit more aligned. I’m seeing several trends, the largest being that having generated a 500K LOC code base in the first case is a sin against mankind. The second one being that you should never have one code base that’s so large, it should be modularized into several hundreds of smaller projects/modules. The third reaction is that Yegge should be using Scala for the rewrite.

Now, from my perspective I don’t really care that he managed to generate that large of a code base. I think any programmer could fall down the same tar pit, especially if it’s over a large amount of time. Secondly, you don’t need to be one programmer to get this problem. I would wager that there are millions of heinous code bases like this, all over the place. So my reaction is rather the pragmatic one: how do you actually handle the situation if you find yourself in it? Provided you understand the whole project and have the time to rewrite it, how should it be done? The first step in my opinion, would probably be to not do it alone. The second step would be to do it in small steps, replacing small parts of the system while writing unit tests while going.

But at the end of the day, maybe a totally new approach is needed. So that’s where Yegge chooses to go with Rhino for implementation language. Now, if I would have tackled the same problem, I would never reimplement the whole application in Rhino – rather, it would be more interesting to try to find the obvious place where the system needs to be dynamic and split it there, keep those parts in Java and then implement the new functionality on top of the stable Java layer. Emacs comes to mind as a typical example, where the base parts are implemented in C, but most of the actual functionality is implemented in Emacs Lisp.

The choice of language is something that Stevey gets a lot of comments about. People just can’t seem to understand why it has to be a dynamic language. (This is another rant, but people who comment on Stevey’s blog seems to have a real hard time distinguishing between static typing and strong typing. Interesting that.) So, one reason is obviously that Stevey prefers dynamic typing. Another is that hotswapping code is one of those intrinsic features of dynamic languages that are really useful, especially in a game. The compilation stage just gets in the way at that level, especially if we’re talking something that’s going to live for a long time, and hopefully not have any down time. I understand why Scala doesn’t cut it in this case. As good as Scala is, it’s good exactly because it has a fair amount of static features. These are things that are extremely nice for certain applications, but it doesn’t fit the top level of a system that needs to be malleable. In fact, I’m getting more and more certain that Scala needs to replace Java, as the semi stable layer beneath a dynamic language, but that’s yet another rant. At the end of it, something like Java needs to be there – so why not make that thing be a better Java?

I didn’t see too many comments about Stevey’s ideas about refactoring and design patterns. Now, refactoring is a highly useful technique in dynamic languages too. And I believe Stevey is wrong saying that refactorings almost always increase the code size. The standard refactorings tend to cause that in a language like Java, but that’s more because of the language. Refactoring in itself is really just a systematic way of making small, safe changes to a code base. The end result of refactoring is usually a cleaner code base, better understanding of that code base, and easier code to read. As such, they are as applicable to dynamic languages as to static ones.

Design patterns are another matter. I believe they serve two purposes – the first and more important being communication. Patterns make it easier to to understand and communicate high level features of a code base. But the second purpose is to make up for deficiencies in the language, and that’s mostly what people see when talking about design patterns. When you’re moving in a language like Lisp, where most design patterns are already in the language, you tend to not need them for communication as much either. Since the language itself provides ways of creating new abstractions, you can use those directly, instead of using design patterns to create “artificial abstractions”.

As a typical example of a case where a design pattern is totally invisible due to language design, take a look at the Factory. Now, Ruby has factories. In fact, they are all over the place. Lets take a very typical example. The Class.new method that you use to create new instances of a class. New is just a factory method. In fact, you can reimplement new yourself:

class Class
def new(*args)
object = self.allocate
object.send :initialize, *args
object
end
end

You could drop this code into any Ruby project, and everything would continue to work like before. That’s because the new-method is just a regular method. The behavior of it can be changed. You can create a custom new method that returns different objects based on something:

class Werewolf;end
class Wolf;end
class Man;end

class << Werewolf
def new(*args)
object = if $phase_of_the_moon == :full
Wolf.allocate
else
Man.allocate
end
object.send :initialize, *args
object
end
end

$phase_of_the_moon = :half
p Werewolf.new

$phase_of_the_moon = :full
p Werewolf.new

Here, creating a new Werewolf will give you either an instance of Man or Wolf depending on the phase of the moon. So in this case we are actually creating and returning something from new that is not even sub classes of Werewolf. So new is just a factory method. Of course, the one lesson we should all take from Factory, is that if you can, you should name your things better than “new”. And since there is no difference between new and other methods in Ruby, you should definitely make sure that creating objects uses the right name.



Ruby closures and memory usage


You might have seen the trend – I’ve been spending time looking at memory usage in situations with larger applications. Specifically the things I’ve been looking at is mostly about deployments where a large number of JRuby runtimes is needed – but don’t let that scare you. This information is exactly as applicable for regular Ruby as for JRuby.

One of the things that can really cause unintended high memory usage in Ruby programs is long lived blocks that close over things you might not intend. Remember, a closure actually has to close over all local variables, the surrounding blocks and also the living self at that moment.

Say that you have an object of some kind that has a method that returns a Proc. This proc will get saved somewhere and live for a long time – maybe even becoming a method with define_method:

class Factory
def create_something
proc { puts "Hello World" }
end
end

block = Factory.new.create_something

Notice that this block doesn’t even care about the actual environment it’s created in. But as long as the variable block is still live, or something else points to the same Proc instance, the Factory instance will also stay alive. Think about a situation where you have an ActiveRecord instance of some kind that returns a Proc. Not an uncommon situation in medium to large applications. But the side effect will be that all the instance variables (and ActiveRecord objects usually have a few) and local variables will never disappear. No matter what you do in the block. Now, as I see it, there are really three different kinds of blocks in Ruby code:

  1. Blocks that process something without needing access to variables outside. (Stuff like [1,2,3,4,5].select {|n| n%2 == 0} doesn’t need closure at all)
  2. Blocks that process or does something based on living variables.
  3. Blocks that need to change variables on the outside.

What’s interesting is that 1 and 2 are much more common than 3. I would imagine that this is because number 3 is really bad design in many cases. There are situations where it’s really useful, but you can get really far with the first two alternatives.

So, if you’re seeing yourself using long lived blocks that might leak memory, consider isolating the creation of them in as small of a scope as possible. The best way to do that is something like this:

o = Object.new
class << o
def create_something
proc { puts "Hello World" }
end
end
block = o.create_something

Obviously, this is overkill if you don’t know that the block needs to be long lived and it will capture things it shouldn’t. The way it works is simple – just define a new clean Object instance, define a singleton method in that instance, and use that singleton method to create the block. The only things that will be captured will be the “o” instance. Since “o” doesn’t have any instance variables that’s fine, and the only local variables captured will be the one in the scope of the create_something method – which in this case doesn’t have any.

Of course, if you actually need values from the outside, you can be selective and onle scope the values you actually need – unless you have to change them, of course:

o = Object.new
class << o
def create_something(v, v2)
proc { puts "#{v} #{v2}" }
end
end
v = "hello"
v2 = "world"
v3 = "foobar" #will not be captured by the block
block = o.create_something(v, v2)

In this case, only “v” and “v2” will be available to the block, through the usage of regular method arguments.

This way of defining blocks is a bit heavy weight, but absolutely necessary in some cases. It’s also the best way to get a blank slate binding, if you need that. Actually, to get a blank slate, you also need to remove all the Object methods from the “o” instance, and ActiveSupport have a library for blank slates. But this is the idea behind it.

It might seem stupid to care about memory at all in these days, but higher memory usage is one of the prices we pay for higher language abstractions. It’s wasteful to take it too far though.