Safe(r) monkey patching


Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.

The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.

Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:

class Object
  def presence
    return self if present?
  end
end

But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:

o = Object.new
p o.method(:presence)  #=> #<Method: Object#presence>

However, if I were to implement it using a module instead, like this:

module Presence
  def presence
    return self if present?
  end
end

Object.send :include, Presence

If I look at the method now, the output is a bit changed:

p o.method(:presence)  #=> #<Method: Object(Presence)#presence>

We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.

There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.

Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.

There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:

module X
  def hello
    42
  end
end

class Foo
  include X
end

Foo.new.hello #=> 42

module Y
  def goodbye
    25
  end
end

module X
  include Y
end

Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)

This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.



Comparing times and dates in Ruby


In one of the Rails projects I’m involved with, we do most of the local development against SQLite and then deploy against Oracle. This is a bit annoying for many reasons, but by far the largest cause of trouble is the handling of dates. I haven’t exactly figured out the rules, but for some reason sometimes Oracle returns DateTime in situations where SQLite returns a Date. This usually causes quite subtle problems that have effects in other parts of the application. This brings me to the small piece of advice I wanted to talk about in this column. Always make sure that you know if you are working with a Date, a Time or a DateTime, since these all have slightly different behavior, especially when it comes to comparisons.

The rule is quite simple. If you think you can have a Time object, make sure to turn it into a DateTime object before trying to compare it to a Date object. What happens otherwise? Unfunny things:

Date.today < Time.now #ArgumentError: comparison of Date with Time failed

Time.now > Date.today #true

Time.now == Date.today #false

Date.today == Time.now #nil

Date.today <=> Time.now #nil

Date.today != Time.now #true

The first time I saw some of these results, I was a bit confused. Especially the last three. But they do make a twisted kind of sense. Namely, it’s OK for the <=> operator to return nil if it can’t do a comparison between two objects. And the != in Ruby is hardcoded to return the inverse of the value returned from ==, and the since nil is a falsey value, the inverse of that becomes true.

What I wanted to mention with these things is that you should always make sure you don’t have the Date on the left hand side of a comparison. Or if you want to do a comparison, explicitly call to_date to coerce them. Finally, if you want to do date and time comparisons, I find the best behavior usually comes from coercing both sides with to_datetime before doing the comparison.



Named Scopes


One of my favorite features of Rails is named scopes. Maybe it’s because I’ve seen so much Rails code with conditions and finders spread all over the code base, but I feel that named scopes should be used almost always where you would find yourself otherwise writing a find_by or even using the conditions argument directly in consumer code. The basic rule I follow is to always use named scopes outside of models to select a specific subset of a model.

So what is a named scope? It’s really just an easier way of creating a custom finder method on your model, but it gives you some extra benefits I’ll talk about later. So if we have code like:

class Foo < ActiveRecord::Base
  def self.find_all_fluxes
    find(:all, :conditions => {:fluxes => true})    
  end
end

p Foo.find_all_fluxes

we can easily replace that with

class Foo < ActiveRecord::Base
  named_scope :find_all_fluxes, :conditions => {:fluxes => :true}
end

p Foo.find_all_fluxes

You can give named_scope any argument that find can take. So you can create named scopes specifically for ordering, specifically for including an association, etc.

The above example is fixed to always have the same conditions. But named_scope can also take arguments. You do that by instead of fixing the arguments, send in a lambda that will return the arguments to use:

def self.ordered_inbetween(from, to)
  find(:all, :conditions => {:order_date => from..to})
end

Foo.ordered_inbetween(10.days.ago, Date.today)

can become:

named_scope :ordered_inbetween, lambda {|from, to|
  {:conditions => {:order_date => from..to}}
}  

Foo.ordered_inbetween(10.days.ago, Date.today)

It’s important that you use the curly braces form of block for the lambda — if you happen to use do-end instead, you might not get the right result, since the block will bind to named_scope. The block that binds to named_scope is used to add extensions to the collections returned, and thus will not contribute to the arguments given to find. It’s a mistake I’ve done several times, and it’s very easy to make. So make sure to test your named scopes too!

So what is it that makes named scopes so useful? Several things. First, they are composable. Second, they establish an explicit interface to the model functionality. Third, you can use them on associations. There are other benefits too, but these are the main ones. Simply put, they are a cleaner solution than the alternatives.

What do I mean by composable? Well, you can call one on the result of calling another. So say that we have these named scopes:

class Person < ActiveRecord::Base
  named_scope :by_name, :order => "name ASC"
  named_scope :by_age, :order => "age DESC"
  named_scope :top_ten, :limit => 10
  named_scope :from, lambda {|country|
    {:conditions => {:country => country}}
  }
end

Then you can say:

Person.top_ten.by_name
Person.top_ten.from("Germany")
Person.top_ten.by_age.from("Germany")

I dare you to do that in a clean way by using class method finders.

I hope you have already seen what I mean by offering a clean interface. You can hide some of the implementation details inside your model, and your controller and view code will read more cleanly because of it.

The third point I made was about associations. Simply, if you have an association, you can use a named scope on that association, just as if it was the model class itself. So if we have:

class ApartmentBuilding < ActiveRecord::Base
  has_many :tenants, :class_name => "Person"
end

a = ApartmentBuilding.first
a.tenants.from("Sweden").by_age

So named scopes are great, and you should use them. Whenever you sit down to write a finder, see if you can’t express it as a named scope instead.

(Note: some of the things I write about here only concerns Rails 2. Most of the work I do is still in the old world of 2.3.)



Use presence


One of the things you quite often see in Rails code bases is code like this:

do_something if !foo.blank?

or

unless foo.blank?
  do_something
end

Sometimes it’s useful to check for blankness, but in my experience it’s much more useful to check for presence. It reads better and doesn’t deal in negations. When using blank? it’s way too easy to use complicated negations.

For some reasons it seems to have escaped people that Rails already defines the opposite of blank? as present?:

do_something if foo.present?

There is also a very common pattern that you see when working with parameters. The first iteration of it looks like this:

name = params[:name] || "Unknown"

This is actually almost always wrong, since it will accept a blank string as a name. In most cases what you really want is something like this:

name = !params[:name].blank? ? params[:name] : "Unknown"

Using our newly learnt trick, it instead becomes:

name = params[:name].present? ? params[:name] : "Unknown"

Rails 3 introduces a new method to deal with this, and you should back port it to any Rails 2 application. It’s called presence, and the definition looks like this:

def presence
  self if present?
end

With this in place, we can finally say

name = params[:name].presence || "Unknown"

These kind of style things make a huge difference in the small. Once you have idiomatic patterns like these in place in your code base, it’s easier to refactor the larger parts. So any time you reach for blank?, make sure you don’t really mean present? or even presence.



RSpec matchers and regexp comments – a possibly useful hack


A few days back, I was sitting at my client and working on a hacky spec to validate some assumptions in a very dirty data set. I wanted to figure out some limits. The basic idea was that I needed to go through an entry for every day the last 8 years. Getting these entries is potentially expensive, and the validation was based on checking that a specific value never turns up. This was quite easy, and I ended up with something like this:

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not == "MAGIC TOKEN"
    end
  end
end

Here you can see that I just simply use a method to calculate the invariant, then use the “should_not ==” to find out if it’s true. Nothing fancy. The problem comes when I want to get information about a failure. Now, I could insert a print statement. That means I’d have to look at all the output until I get to the end, to see which one failed. I could also rescue all exceptions, print the offending information and then reraise. But the best solution would be to give RSpec a failure message. Now, you can definitely do this for RSpec in other matchers, but I couldn’t find a way of doing it with the == matcher. One thing I could have done, was to just write my own matcher.That also seemed inefficient. This was a throw away thing, run once and then delete.

What I ended up doing was actually quite elegant, in a very disgusting way. It works, and it might be useful for someone else, sometime. But don’t EVER do anything like this in code you will save.

require 'spec_helper'

describe "Data invariant" do
  it "holds" do
    (8*365).times do |n|
      date = n.days.ago
      calculate_token_at(date).should_not =~ /\AMAGIC TOKEN(?#Invariant failed on: #{date})\Z/
    end
  end
end

So why does this work? Well, it turns out that you can have comments in regular expressions. And you can interpolate arbitrary values into regexps, just like with strings. So I can embed the failure information in a comment in the regexp. This will only be displayed when the match fails, since RSpec by default says something like “expected MAGIC TOKEN to not match /\AMAGIC TOKEN(?#Invariant failed on: 2010-06-04)\Z/”, so you get the information necessary. The comment does not contribute to the matching in anyway. There’s another subtle point here. I haven’t used ^ and $ for anchoring the pattern. Instead I use \A and \Z. The reason is that otherwise, my regexp wouldn’t have the same behavior as comparing against a string, since ^ and $ match the beginning and end of lines too, not only the beginning and end of buffer.

Anyway, I thought I’d share this. In basically all cases, don’t do this. But it’s still a bit funny.



Patterns of method missing


One of the more dynamic features of Ruby is method_missing, a way of intercepting method calls that would cause a NoMethodError if method_missing isn’t there. This feature is by no means unique to Ruby. It exists in Smalltalk, Python, Groovy, some JavaScripts, and even most CLOS extensions have it. But Ruby being what it is, for some reason this feature seem to have more heavily used in Ruby than anywhere else. It’s also a feature most Ruby developers seem to know about. Is this because Ruby people are power hungy, crazy monkey patchers? Maybe, but method_missing is also potentially very useful, if used correctly. But of course, it’s exceedingly easy to misuse. In almost all cases you think you need method_missing, you actually don’t.

The purposes of this post is to take a look at a few ways people are using method_missing in the wild, what the consequences are and what you can do to mitigate them. I’m bound to have missed a few use cases here, so please feel free to add more in the comments.

Adding better debug information on failure

One of the most simple but still very powerful ways of using method_missing is to allow it to include more information in the error message than you would usually have got. A simple example of that could look like this:

class MyFoo
  def method_missing(method, *args, &block)
    raise NoMethodError, <<ERRORINFO
method: #{method}
args: #{args.inspect}
on: #{self.to_yaml}
ERRORINFO
  end
end

This usage is pretty common – and is in my opinion a very valid use of the functionality. The only thing you have to be careful about is to not introduce any recursive calls to method_missing. Say if you forget to require YAML in the above example – the error would be a stack overflow.

One of the places where you’ve almost certainly seen this used is in Rails, where the feature is called whiny nils. The idea is that nil will have a method missing that gives some extra information. It can guess based on the method name what object you were expecting. This could be a typical message from Rails whiny nil:

Loading development environment (Rails 2.2.2)
>> nil.last
NoMethodError: You have a nil object when you didn't expect it!
You might have expected an instance of Array.
The error occurred while evaluating nil.last
	from (irb):2

This functionality is exceedingly simple to implement, but gives you lots of leverage to find and debug your problem quicker and easier.

Encode parameters in method name

Another common pattern is to use the name of the method to encode parameters, instead of sending them in as explicit parameters. In some cases this can be used to good effect, but if possible it would be better to encode the possible names beforehand, or send in the parameters as actual parameters instead. Contrast a Rails-style find expression:

Person.find_by_name_and_age("Ola", 28)

With another way of creating the same API:

Person.find_by(:name => "Ola", :age => 28)

The difference here isn’t that large, and in the case of Rails I do think they are harmless – but creating these kinds of API’s make it much harder to debug and maintain an application, so care should be taken.

Builders

Creating XML, HTML, graphical UIs and other hierarchical data structures lend themselves very well to the builder pattern. The idea of a builder is that you use Ruby’s blocks and method_missing to make it easy to create any kind of output structure. The canonical example in Ruby is Jim Weirich’s Builder, that can be used to easily create complicated XML structures. A small example:

builder = Builder::XmlMarkup.new
xml = builder.books { |b|
  b.book :isbn => "124" do
    b.title "The Prefect"
    b.author "Alastair Reynolds"
  end

  b.book :isbn => "65565" do
    b.title "Against a Dark Background"
    b.author "Iain M Banks"
  end
}

The result of this code will be a properly formatted and escaped XML document. Most notable, all the finicky details of closing tags and escaping rules are taken care of for us.

In general, this approach is very pleasant to work with. It’s easy to test (since you don’t even have to generate the real XML to make sure it’s correct), and it works well with your existing Ruby tools. It’s also quite easy to implement a basic version of. For the fully general case you need to use a blank slate object, though.

Accessors

The inversion of the builder pattern is to use a parser that slurps in an XML document (or a YAML, database or anything else really), and then allow you to access the elements of it by using regular Ruby method calls – intercepting these calls with method missing and looking them up. A usage could look something like this:

slurper = Slurp <<XML
<books>
  <book isbn="14134">
    <title>Revelation Space</title>
    <author>Alastair Reynolds</author>
  </book>
  <book isbn="53534">
    <title>Accelerando</title>
    <author>Charles Stross</author>
  </book>
</books>
XML

puts slurper.books.book[1].author

I’m not much of a fan of this approach. In almost all cases there are better ways of doing it than using method_missing. The only valid use case for something like this would be for a throwaway really hacky oneoff thing. But in general, Ruby allows you to define methods dynamically anyway, so you can do that instead for this case.

Proxy/delegation

When you want to insert a proxy that resends method calls somewhere else, method_missing can be an easy way to get that to work. You can resend method calls to another object, you can resend to several objects, you can send method calls over the wire, to implement a crude RMI system. You can also record method calls and write them to disk. All of these can be achieved with just a few lines of code. But in many cases there are better options – especially if you want to do delegation. One of the dangers (and the power also, of course) of method_missing is that it can take any kind of method call. So if you misspell something, method_missing will happily treat it the same way.

But when delegating, you generally want to be explicit about what you delegate, to avoid this problem. There are several classes in the standard library that allow you to explicitly say what methods to delegate and where to delegate them – and if you can, try using this instead. Proxying and delegation should be explicit if possible.

Making parts of an API extensible and optional

In some cases you might want to create a base class for an API, but allow the subclasses to add additional API methods. In some cases it can make sense to ignore calls to these subclass API methods if called on something that doesn’t support it. By definition, the super class can’t actually know which API methods the subclasses might add, so it makes sense to use method_missing to open up the API and make it more convenient. This is not very common – and in most cases should probably not be done, but sometimes it can be a useful technique.

Test helpers

All kinds of test helpers can be created using method_missing. They can be used to implement factories, delegate and do all kinds of things. If you take a look at any open source Ruby project, the tests is the place where you are most likely to find implementations of method_missing. I can’t say that these implementations actually follow any specific patterns either.

Summary

Finally, remember. Method missing is a powerful powerful feature – it should not be used in almost all the cases. But if you do want to use it, don’t forget to implement responds_to? correctly. And if you’re designing your class for subclassing, it’s also important to design your method_missing usage for inheritance. Liskovs Substitution Principle applies here.



RubyConf India


I am part of a team at ThoughtWorks helping out organizing the very first RubyConf in India. I’m very excited about this. So if you have the possibility to come to Bangalore, the event will be March 20 and 21.

We already have some solid speakers lined up. Chad Fowler will keynote, and so will I, and we have a number of other people coming in. A few of my colleagues from ThoughtWorks, such as Sarah Taraporewalla, Sidu Ponnappa and Aman King. Other speakers include Hemant Kumar, Pradeep Elankumaran, Arun Gupta and others. Finally, Nick Sieger will also come to Bangalore for this event!

So as you can see, this is gearing up to be a great event! Hope to see you there.



RubyFoo


I spent this Friday and Saturday in London at the RubyFoo conference, organized by Trifork. RubyFoo is a small pre-conference to the larger JAOO conference. As you might expect, it’s focused on Ruby, and it’s quite small. On the friday we were about 50 people, and on Saturday about 40. The small amount of people and the fact that all presentations were in the same track made it much easier to network and communicate with people. I liked the focus this gave to the conference, and it was also an excellent opportunity to meet new people and get new ideas.

On the Friday there were five presentations, and on the Saturday it was an open spaces. The five presentations were all focused around the area of communicative programming. I talked about JRuby and did several demonstrations of how JRuby can be used to call out to different languages. My examples included talking to Clojure, Erlang and Haskell.

After me, Aslak Hellesøy talked about Cucumber and how Cucumber supports lots of different programming languages. Very cool. Aslak always give good presentations.

We then had lunch, and then Sam Aaron gave an interesting talk about communicative programming, and the essence of what we are doing. Very cerebral, definitely something that sparked lots of thoughts in peoples minds.

Adam Wiggins gave a talk about Heruko. I haven’t actually tried Heruko yet, but it looks very cool.

Finally, Matz gave a talk about the different styles of programming in Ruby, tied in with his history of creating Ruby and what the inspirations were. Very nice.

On the Saturday my colleague Dan North facilitated the open spaces discussions. I gave a 30 minute talk about Ioke – people seemed to enjoy it. After that Dan North, me, Aslak and a few others had a discussion about static versus dynamic typing.

After lunch I held a discussion about Ruby 1.9, getting some ideas why people weren’t using it, and what problems the people using it had encountered.

Finally, me, Aslak and Sam sat down to add Ioke support to Cucumber. This went really well – and I liked pairing with Aslak. Sadly I couldn’t stay until we were done, but Aslak and the others continued while I was heading out to the airport.

All in all, RubyFoo was a great conference, and I hope they can keep the same size the next time. 50 people were really a great size, and I liked the discussions we had.



Charles, Tom and Nick to EngineYard – and the future of JRuby


Most people have already heard the news that Charles, Tom and Nick are going to Engine Yard to work on JRuby. I’ve been asked for my opinion by a few people, and I’ve also seen some common reactions that I would like to comment on. Of course I only speak for myself, not for Charles, Tom or Nick, and definitely not for neither Sun, Oracle or Engine Yard.

Lets get the congratulations in order first. This is great news for Charles, Tom and Nick, and I definitely wish them well with at their new work. I totally understand their move and would have done the same thing if I had been in the same situation.

This is also good news for the JRuby project. The main concern from Charles and company has been to ensure that the JRuby project doesn’t suffer – that has been the overriding concern in this decision. Of course, having Nick be able to work on JRuby proper will also be great. Another full time resource.

Now for some of the comments and worries. Tim Anderson writes in his blog about it: http://www.itjoblog.co.uk/2009/07/jruby.html. The problem with some of the conclusions in this blog, especially that Oracle should have done a better job at reassuring Charles & co about the future of JRuby, goes totally against what is even possible for a company in this situation to do. I’ve heard this comment from several different places, so let me make this very plain. It would have been grossly illegal for any representative from Oracle to give ANY indication to Charles, Tom or Nick about what their intention for JRuby was. It will continue to be this way until the buyout is done. For all we know, Charles, Tom and Nick might have asked any Oracle person they could find what would happen, but they wouldn’t have been able to get an answer they could rely on. That’s how these things work.

Seeing as that insecurity would be around for quite some time, and since this merger is pretty big, it was a reasonable doubt from the JRuby guys perspective that Oracle wouldn’t give any indication for quite some time. During that time the JRuby development would be in jeopardy. So they made a decision to ensure the safety of the project. (When I mean safety of the project, I of course mean continued full time resources for working on it). From this perspective they didn’t really have any choice. This is no indication whatsoever of anything else. It is no indication of Oracle’s future Java strategy, it is no indication of what will happen with languages on JVM in the future. It is just a rational decision based on what can be known right now.

Many from the Ruby and JRuby community has expressed concerns that Engine Yard is primarily a Rails company, and that Rails bugs will take priority over Java integration or other pieces of the JRuby story. This is simply not true. Read any interview with Charles or any of the official announcements. The JRuby focus from Engine Yard will definitely not have overriding Rails concerns.

Another worry I’ve heard is that Engine Yard now “owns” core developers for MRI, Rubinius and JRuby, and as such can use this power to control the future of Ruby. <insert evil laugh here>.

Yes. Engine Yard does have lots of power over the future of Ruby right now. Is that a bad thing? All the above projects are proper open source projects, and nothing EY can do will stop that. EY is a next generation company. They understand open source and they swear by it. Just look at how much internal infrastructure they have opened up and released for general consumption. There can be no doubt that EY believes in open source.

If you’re really worried though… This is your chance to influence things. Submit patches to MRI, Rubinius or JRuby. Contribute enough and you will become a core developer, and you will have as much power as Engine Yard or any of the other core developers. (Remember that only 3 of the 8ish JRuby core developers work for Engine Yard). Once again – if you’re worried, do something about it. Don’t spread FUD.

Personally, I think the future of Ruby is looking bright.



Static type thinking in dynamically typed languages


A few days back I said something on Twitter that caused some discussion. I thought I’d spend some time explaining a bit more what I meant here. The originating tweet came from Debasish Ghosh, who wrote this:

“greenspunning typechecking into ruby code” .. isn’t that what u expect when u implement a big project in a dynamically typed language ?

My answer was:

@debasishg really don’t agree about that. if you handle a dynamically typed project correctly, you will not end up greenspunning types.

Lets take this from the beginning. The whole point of duck typing as a philosophy when writing Ruby code is that you shouldn’t care about the actual class of an object you get as argument. You should expect the operations you need, to actually be there, and assume they are. That means anyone can send in any kind of object as long as it fulfills the expected protocol.

One of the problems in these kind of discussions is that people conflate classes with types in dynamic languages. In well written Ruby code you will usually end up with a type for every argument – that type is a number of constraints and protocols that will wary depending on what you do with the objects. But the point is that it generally will make things more complicated to equate classes with these types, and you will design classes without any real purpose. Since you don’t have static checking, you don’t need to have overarching classes that act as type specifiers. The types will instead be implied in the contract of a method.

So far so good. Should you keep this in mind when designing a method? In most cases, no. I tend to believe that you will end up conflating classes and types. That’s what I’ve seen on several projects at least. The first warning sign is generally kind_of? checks. Of course, you can do things this way, but you will restrict quite a lot of the power of dynamic typing by doing this. One of the key benefits of a dynamically typed language is the added flexibility. If you end up greenspunning a type system, you have just negated a large part of the benefit of your language.

The types in a well defined system will be implicitly defined by what the method actually does – and specified by the tests for that method. But if you try to design explicit types, you will end up writing a static type system into your tests, which is not the best way to develop in these languages.