The contract of IO#read


It’s interesting. After Charlie made an immense effort and rewrote our IO system, basically from scratch, I have started to find bugs. But these are generally not bugs in the IO code, but bugs in Ruby libraries that depend on the way MRI usually works. One of the more annoying ones are IO#read(n), where n is the length you want to read.

This method is not guaranteed to return a string of length n, even if we haven’t hit EOF yet. You can NEVER be sure that what you get back is the length you requested. Ever. If you have code that doesn’t check the length of the returned string from read, you are almost guaranteed to have a bug just waiting to happen.

Of course, it might work perfectly on your machine and every other machine you test it on. The reason for this is that read(n) will usually return n bytes, but that depends on the socket implementation or file reading implementation of the operating system, it depends on the size of the cache in the network interface, it depends on network latency, and many other things. Please, just make sure to check the return values length before going ahead and using it.

Case in point: net/ldap has this exact problem. Look in net/ber.rb and you will see. There are also two – possibly three – bugs (couple of years old too) that reports different failures because of this.

One thing that makes this problem hard to find is the fact that if you insert debug statement, you will affect the timing in such a way that the code might actually work with debug statement but not without them.

Oh, did I mention that StringIO#read works the same way? It has exactly the same guarantees as IO#read.



Language design philosophy: more than one way?


When talking about how dynamic scripting languages are designed, people have a tendency to divide them into “There is more than one way to do it” and “There is one way to do it”. Perl is the quintessential example of “more than one way”, and Python is the opposite, going so far as to make it impossible to have your own indentation.

These different ways of doing things really divide programmers. Some hate the way Python gives you a bondage strait jacket and no scissors, while some programmers love that they don’t need to make any choices and that everyone’s code will be equally readable.

From a pure perspective, the Python philosophy seems to be the right one, but it just doesn’t work for me. In the same way, I agree with Perl’s way of doing things, but the problems that cause in most Perl code is just amazing. Sometimes it feels like the Python way is actually directly a result of someone reacting extremely badly to a Perl code base and decided to never allow that to happen in Python.

So what’s the point? Well, the point is Ruby. In fact, Ruby has almost all of the flexibility of Perl to do things in different ways. But at the end of the day, none of the Perl problems tend to show up. Why is this? And why do I feel so comfortable in Ruby’s “There is more than one way to do it” philosophy, while the Perl one scares me?

I think it comes down to language design. The Python approach is impossible for the simple reason that what the language designer chooses is going to be the “one way”, by fiat. Some people will agree, and some will not. But what I’m seeing in Ruby is that the many ways have been transformed into idioms and guidelines. There are no hard rules, but the community have evolutionary evolved idioms that work and found out many of the ways that doesn’t work. This seems to be the right way – if you do the choice as a language designer, you have actually chosen the people that will use your language: that’s going to be the persons who doesn’t dislike the language designers choices. But if you leave it open enough for evolutionary community design to happen you can actually get the best of both world: both a best way to do things, and something that works for a much larger percentage of the programmer world.

I have come to believe that this is one of the major reasons that Ruby feels so good to me, and is such a good language. And it’s a lesson for language designers. When you choose philosophy, make sure to take the possibility of communities and idiom evolution into account.



Ruby antipattern: Using eval without positioning information


I have noticed that the default way eval, instance_eval(String) and module_eval(String) almost never does what you want unless you supply positioning information. Oh, it will execute all right, and provided you have no bugs in your code, everything is fine. But sooner or later you or someone else will need to debug the code in that eval. In those cases it’s highly annoying when the full error message is “(eval):1: redefining constant Foo”. Sure, in most cases you can still get all the information. But why not just make sure that everything needed is there to trace the call?

I would recommend changing all places you have eval, or where using instance_eval or module_eval that takes a string argument, into the version that takes a file and line number:

eval("puts 'hello world'")
# becomes
eval("puts 'hello world'", binding, __FILE__, __LINE__)

"str".instance_eval("puts self")
# becomes
"str".instance_eval("puts self", __FILE__, __LINE__)

String.module_eval("A=1")
# becomes
String.module_eval("A=1", __FILE__, __LINE__)


Scala testing with specs


So. The story about unit testing is over for now. I will use specs. Eric made an incredible job and got the JUnit support working with JUnit 4 too very quickly. Today I reintegrated it, and everything works fine.

So, if you want working Ant integration for your Scala testing, I recommend specs.

Of course, this still doesn’t explain while all the other alternatives failed so miserably. Hopefully there will be a bit more testing in the community soon. Testing frameworks need competition to evolve well.

It’s funny. One of the points to my original post about Scala unit testing was this quote: “Now some lovely Ruby people are looking at Scala, and the very first thing they must do (of course) is write the sacred unit tests:”.

I’m not sure about you, but I would say that that’s a good statement about Ruby people in general, if that’s the way people view us. =)



JtestR 0.1 released


If people have wondered, this is what I have been working on in my spare time the last few weeks. But now it’s finally released! The first version of JtestR.

So what is it? A library that allows you to easily test your Java code with Ruby libraries.

Homepage: http://jtestr.codehaus.org
Download: http://dist.codehaus.org/jtestr

JtestR 0.1 is the first public release of the JtestR testing tool. JtestR integrates JRuby with several Ruby frameworks to allow painless testing of Java code, using RSpec, Test/Unit, dust and Mocha.

Features:

  • Integrates with Ant and Maven
  • Includes JRuby 1.1, Test/Unit, RSpec, dust, Mocha and ActiveSupport
  • Customizes Mocha so that mocking of any Java class is possible
  • Background testing server for quick startup of tests
  • Automatically runs your JUnit codebase as part of the build

Getting started: http://jtestr.codehaus.org/Getting+Started

Team:
Ola Bini – ola.bini@gmail.com
Anda Abramovici – anda.abramovici@gmail.com



Code size and dynamic languages


I’ve had a fun time the last week noting the reactions to Steve Yegge’s latest post (Code’s Worst Enemy). Now, Yegge always manages to write stuff that generate interesting – and in some cases insane – comments. This time, the results are actually quite a bit more aligned. I’m seeing several trends, the largest being that having generated a 500K LOC code base in the first case is a sin against mankind. The second one being that you should never have one code base that’s so large, it should be modularized into several hundreds of smaller projects/modules. The third reaction is that Yegge should be using Scala for the rewrite.

Now, from my perspective I don’t really care that he managed to generate that large of a code base. I think any programmer could fall down the same tar pit, especially if it’s over a large amount of time. Secondly, you don’t need to be one programmer to get this problem. I would wager that there are millions of heinous code bases like this, all over the place. So my reaction is rather the pragmatic one: how do you actually handle the situation if you find yourself in it? Provided you understand the whole project and have the time to rewrite it, how should it be done? The first step in my opinion, would probably be to not do it alone. The second step would be to do it in small steps, replacing small parts of the system while writing unit tests while going.

But at the end of the day, maybe a totally new approach is needed. So that’s where Yegge chooses to go with Rhino for implementation language. Now, if I would have tackled the same problem, I would never reimplement the whole application in Rhino – rather, it would be more interesting to try to find the obvious place where the system needs to be dynamic and split it there, keep those parts in Java and then implement the new functionality on top of the stable Java layer. Emacs comes to mind as a typical example, where the base parts are implemented in C, but most of the actual functionality is implemented in Emacs Lisp.

The choice of language is something that Stevey gets a lot of comments about. People just can’t seem to understand why it has to be a dynamic language. (This is another rant, but people who comment on Stevey’s blog seems to have a real hard time distinguishing between static typing and strong typing. Interesting that.) So, one reason is obviously that Stevey prefers dynamic typing. Another is that hotswapping code is one of those intrinsic features of dynamic languages that are really useful, especially in a game. The compilation stage just gets in the way at that level, especially if we’re talking something that’s going to live for a long time, and hopefully not have any down time. I understand why Scala doesn’t cut it in this case. As good as Scala is, it’s good exactly because it has a fair amount of static features. These are things that are extremely nice for certain applications, but it doesn’t fit the top level of a system that needs to be malleable. In fact, I’m getting more and more certain that Scala needs to replace Java, as the semi stable layer beneath a dynamic language, but that’s yet another rant. At the end of it, something like Java needs to be there – so why not make that thing be a better Java?

I didn’t see too many comments about Stevey’s ideas about refactoring and design patterns. Now, refactoring is a highly useful technique in dynamic languages too. And I believe Stevey is wrong saying that refactorings almost always increase the code size. The standard refactorings tend to cause that in a language like Java, but that’s more because of the language. Refactoring in itself is really just a systematic way of making small, safe changes to a code base. The end result of refactoring is usually a cleaner code base, better understanding of that code base, and easier code to read. As such, they are as applicable to dynamic languages as to static ones.

Design patterns are another matter. I believe they serve two purposes – the first and more important being communication. Patterns make it easier to to understand and communicate high level features of a code base. But the second purpose is to make up for deficiencies in the language, and that’s mostly what people see when talking about design patterns. When you’re moving in a language like Lisp, where most design patterns are already in the language, you tend to not need them for communication as much either. Since the language itself provides ways of creating new abstractions, you can use those directly, instead of using design patterns to create “artificial abstractions”.

As a typical example of a case where a design pattern is totally invisible due to language design, take a look at the Factory. Now, Ruby has factories. In fact, they are all over the place. Lets take a very typical example. The Class.new method that you use to create new instances of a class. New is just a factory method. In fact, you can reimplement new yourself:

class Class
def new(*args)
object = self.allocate
object.send :initialize, *args
object
end
end

You could drop this code into any Ruby project, and everything would continue to work like before. That’s because the new-method is just a regular method. The behavior of it can be changed. You can create a custom new method that returns different objects based on something:

class Werewolf;end
class Wolf;end
class Man;end

class << Werewolf
def new(*args)
object = if $phase_of_the_moon == :full
Wolf.allocate
else
Man.allocate
end
object.send :initialize, *args
object
end
end

$phase_of_the_moon = :half
p Werewolf.new

$phase_of_the_moon = :full
p Werewolf.new

Here, creating a new Werewolf will give you either an instance of Man or Wolf depending on the phase of the moon. So in this case we are actually creating and returning something from new that is not even sub classes of Werewolf. So new is just a factory method. Of course, the one lesson we should all take from Factory, is that if you can, you should name your things better than “new”. And since there is no difference between new and other methods in Ruby, you should definitely make sure that creating objects uses the right name.



Ruby closures and memory usage


You might have seen the trend – I’ve been spending time looking at memory usage in situations with larger applications. Specifically the things I’ve been looking at is mostly about deployments where a large number of JRuby runtimes is needed – but don’t let that scare you. This information is exactly as applicable for regular Ruby as for JRuby.

One of the things that can really cause unintended high memory usage in Ruby programs is long lived blocks that close over things you might not intend. Remember, a closure actually has to close over all local variables, the surrounding blocks and also the living self at that moment.

Say that you have an object of some kind that has a method that returns a Proc. This proc will get saved somewhere and live for a long time – maybe even becoming a method with define_method:

class Factory
def create_something
proc { puts "Hello World" }
end
end

block = Factory.new.create_something

Notice that this block doesn’t even care about the actual environment it’s created in. But as long as the variable block is still live, or something else points to the same Proc instance, the Factory instance will also stay alive. Think about a situation where you have an ActiveRecord instance of some kind that returns a Proc. Not an uncommon situation in medium to large applications. But the side effect will be that all the instance variables (and ActiveRecord objects usually have a few) and local variables will never disappear. No matter what you do in the block. Now, as I see it, there are really three different kinds of blocks in Ruby code:

  1. Blocks that process something without needing access to variables outside. (Stuff like [1,2,3,4,5].select {|n| n%2 == 0} doesn’t need closure at all)
  2. Blocks that process or does something based on living variables.
  3. Blocks that need to change variables on the outside.

What’s interesting is that 1 and 2 are much more common than 3. I would imagine that this is because number 3 is really bad design in many cases. There are situations where it’s really useful, but you can get really far with the first two alternatives.

So, if you’re seeing yourself using long lived blocks that might leak memory, consider isolating the creation of them in as small of a scope as possible. The best way to do that is something like this:

o = Object.new
class << o
def create_something
proc { puts "Hello World" }
end
end
block = o.create_something

Obviously, this is overkill if you don’t know that the block needs to be long lived and it will capture things it shouldn’t. The way it works is simple – just define a new clean Object instance, define a singleton method in that instance, and use that singleton method to create the block. The only things that will be captured will be the “o” instance. Since “o” doesn’t have any instance variables that’s fine, and the only local variables captured will be the one in the scope of the create_something method – which in this case doesn’t have any.

Of course, if you actually need values from the outside, you can be selective and onle scope the values you actually need – unless you have to change them, of course:

o = Object.new
class << o
def create_something(v, v2)
proc { puts "#{v} #{v2}" }
end
end
v = "hello"
v2 = "world"
v3 = "foobar" #will not be captured by the block
block = o.create_something(v, v2)

In this case, only “v” and “v2” will be available to the block, through the usage of regular method arguments.

This way of defining blocks is a bit heavy weight, but absolutely necessary in some cases. It’s also the best way to get a blank slate binding, if you need that. Actually, to get a blank slate, you also need to remove all the Object methods from the “o” instance, and ActiveSupport have a library for blank slates. But this is the idea behind it.

It might seem stupid to care about memory at all in these days, but higher memory usage is one of the prices we pay for higher language abstractions. It’s wasteful to take it too far though.



AspectJ and JRuby?


This is one of those idea-posts. There is no implementation and no code. But if someone wants to take the idea and do something with it, go ahead.

The gist of it is this: what if you could implement the actions for AspectJ in Ruby? You could define an on-load pointcut that matches everything and dispatches to Ruby. From there on you could do basically anything from the Ruby side – including dynamically changing the stuff happening. Of course, there would be a performance cost, but it could be incredibly useful for debugging, when you don’t really want to restart your application and recompile every time you want to change the implementation of the aspect code.



ThoughtWorks calling Ruby developers in San Francisco


Friends! ThoughtWorks is hiring Ruby developers all over the world, but right now San Francisco is the hottest place to be. So if you’re located in the Bay Area and want to work with Ruby and Rails, don’t hesitate to make contact.

The time since I joined ThoughtWorks about 6 months ago have been the best of my life, and of all our offices around the world, I like the San Francisco one best. ThoughtWorks is really the home for people passionate about development and people who love Ruby.

If you have been following my blog, you know about Oracle Mix and other interesting things we have been doing out of the SF office. And there’s more to come – exciting times!

Could ThoughtWorks in San Francisco be your home? Take contact with recruiting here: http://www.thoughtworks.com/work-for-us/apply-online.html, or email me directly and I’ll see to it that your information gets to the right place!



Accumulators in Ruby


So, me and Ben Butler-Cole discussed the fact that accumulators in Ruby isn’t really done in the obvious way. This is due to the somewhat annoying feature of Ruby that nested method definitions with the def-keyword isn’t lexically scoped, so you can’t implement an internal accumulator like you would in Python, Lisp, Haskell or any other languages like that.

I’ve seen three different ways to handle this in Ruby code. To illustrate, let’s take the classic example of reversing a list. The functional way of doing this is to define an internal accumulator, this takes care of making the implementation tail recursive, and very efficient on linked list.

So, the task is to reverse a list in a functional, recursive approach. First version, using optional arguments:

class Reverser
def reverse(list, index = list.length-1, result = [])
return result if index == -1
result << list[index]
reverse(list, index - 1, result)
end
end

So, this one uses two default arguments, which makes it very easy to reuse the same method in the recursive case. The problem here is that the optional arguments expose an implementation detail which the caller really has no need of knowing. The implementation is simple but it puts more burden on the caller. This is also the pattern I see in most places in Ruby code. From a design perspective it’s not really that great.

So, the next solution is to just define a private accumulator method:

class Reverser
def reverse(list)
reverse_accumulator(list, list.length-1, [])
end

private
def reverse_accumulator(list, index, result)
return result if index == -1
result << list[index]
reverse_accumulator(list, index - 1, result)
end
end

This is probably in many cases the preferable solution. It makes the interface easier, but adds to the class namespace. To be sure, the responsibility for the implementation of an algorithm should ideally belong at the same place. With this solution you might have it spread out all over the place. Which brings us to the original problem – you can’t define lexically scoped methods within another method. So, in the third solution I make use of the fact that you can actually have recursive block invocations:

class Reverser
def reverse(list)
(rec = lambda do |index, result|
return result if index == -1
result << list[index]
rec[index - 1, result]
end)[list.length-1, []]
end
end

The good thing about this implementation is that we avoid the added burden of both a divided implementation and an exposed implementation. It might seem a bit more complex to read if you’re not familiar with the pattern. Remember that [] is an alias for the call method on Procs. Also, since the assignment to rec happens in a static scope we can actually refer to it from inside the block and get the right value. Finally, all assignments return the assigned value which means that we can just enclose everything in parens and apply it directly. Another neat aspect of this is that since the block is a closure, we don’t need to pass the list variable around anymore.

Does anyone have a better solution on how to handle this? Accumulators aren’t really that common in Ruby – is this a result of Ruby making functional programming unneat, or is it just don’t needed?