December 21st, 2007
Ruby closures and memory usage
You might have seen the trend – I’ve been spending time looking at memory usage in situations with larger applications. Specifically the things I’ve been looking at is mostly about deployments where a large number of JRuby runtimes is needed – but don’t let that scare you. This information is exactly as applicable for regular Ruby as for JRuby.
One of the things that can really cause unintended high memory usage in Ruby programs is long lived blocks that close over things you might not intend. Remember, a closure actually has to close over all local variables, the surrounding blocks and also the living self at that moment.
Say that you have an object of some kind that has a method that returns a Proc. This proc will get saved somewhere and live for a long time – maybe even becoming a method with define_method:
class Factory
def create_something
proc { puts "Hello World" }
end
end
block = Factory.new.create_something
Notice that this block doesn’t even care about the actual environment it’s created in. But as long as the variable block is still live, or something else points to the same Proc instance, the Factory instance will also stay alive. Think about a situation where you have an ActiveRecord instance of some kind that returns a Proc. Not an uncommon situation in medium to large applications. But the side effect will be that all the instance variables (and ActiveRecord objects usually have a few) and local variables will never disappear. No matter what you do in the block. Now, as I see it, there are really three different kinds of blocks in Ruby code:
- Blocks that process something without needing access to variables outside. (Stuff like [1,2,3,4,5].select {|n| n%2 == 0} doesn’t need closure at all)
- Blocks that process or does something based on living variables.
- Blocks that need to change variables on the outside.
What’s interesting is that 1 and 2 are much more common than 3. I would imagine that this is because number 3 is really bad design in many cases. There are situations where it’s really useful, but you can get really far with the first two alternatives.
So, if you’re seeing yourself using long lived blocks that might leak memory, consider isolating the creation of them in as small of a scope as possible. The best way to do that is something like this:
o = Object.new
class << o
def create_something
proc { puts "Hello World" }
end
end
block = o.create_something
Obviously, this is overkill if you don’t know that the block needs to be long lived and it will capture things it shouldn’t. The way it works is simple – just define a new clean Object instance, define a singleton method in that instance, and use that singleton method to create the block. The only things that will be captured will be the “o” instance. Since “o” doesn’t have any instance variables that’s fine, and the only local variables captured will be the one in the scope of the create_something method – which in this case doesn’t have any.
Of course, if you actually need values from the outside, you can be selective and onle scope the values you actually need – unless you have to change them, of course:
o = Object.new
class << o
def create_something(v, v2)
proc { puts "#{v} #{v2}" }
end
end
v = "hello"
v2 = "world"
v3 = "foobar" #will not be captured by the block
block = o.create_something(v, v2)
In this case, only “v” and “v2” will be available to the block, through the usage of regular method arguments.
This way of defining blocks is a bit heavy weight, but absolutely necessary in some cases. It’s also the best way to get a blank slate binding, if you need that. Actually, to get a blank slate, you also need to remove all the Object methods from the “o” instance, and ActiveSupport have a library for blank slates. But this is the idea behind it.
It might seem stupid to care about memory at all in these days, but higher memory usage is one of the prices we pay for higher language abstractions. It’s wasteful to take it too far though.