Ruby closures and memory usage


You might have seen the trend – I’ve been spending time looking at memory usage in situations with larger applications. Specifically the things I’ve been looking at is mostly about deployments where a large number of JRuby runtimes is needed – but don’t let that scare you. This information is exactly as applicable for regular Ruby as for JRuby.

One of the things that can really cause unintended high memory usage in Ruby programs is long lived blocks that close over things you might not intend. Remember, a closure actually has to close over all local variables, the surrounding blocks and also the living self at that moment.

Say that you have an object of some kind that has a method that returns a Proc. This proc will get saved somewhere and live for a long time – maybe even becoming a method with define_method:

class Factory
def create_something
proc { puts "Hello World" }
end
end

block = Factory.new.create_something

Notice that this block doesn’t even care about the actual environment it’s created in. But as long as the variable block is still live, or something else points to the same Proc instance, the Factory instance will also stay alive. Think about a situation where you have an ActiveRecord instance of some kind that returns a Proc. Not an uncommon situation in medium to large applications. But the side effect will be that all the instance variables (and ActiveRecord objects usually have a few) and local variables will never disappear. No matter what you do in the block. Now, as I see it, there are really three different kinds of blocks in Ruby code:

  1. Blocks that process something without needing access to variables outside. (Stuff like [1,2,3,4,5].select {|n| n%2 == 0} doesn’t need closure at all)
  2. Blocks that process or does something based on living variables.
  3. Blocks that need to change variables on the outside.

What’s interesting is that 1 and 2 are much more common than 3. I would imagine that this is because number 3 is really bad design in many cases. There are situations where it’s really useful, but you can get really far with the first two alternatives.

So, if you’re seeing yourself using long lived blocks that might leak memory, consider isolating the creation of them in as small of a scope as possible. The best way to do that is something like this:

o = Object.new
class << o
def create_something
proc { puts "Hello World" }
end
end
block = o.create_something

Obviously, this is overkill if you don’t know that the block needs to be long lived and it will capture things it shouldn’t. The way it works is simple – just define a new clean Object instance, define a singleton method in that instance, and use that singleton method to create the block. The only things that will be captured will be the “o” instance. Since “o” doesn’t have any instance variables that’s fine, and the only local variables captured will be the one in the scope of the create_something method – which in this case doesn’t have any.

Of course, if you actually need values from the outside, you can be selective and onle scope the values you actually need – unless you have to change them, of course:

o = Object.new
class << o
def create_something(v, v2)
proc { puts "#{v} #{v2}" }
end
end
v = "hello"
v2 = "world"
v3 = "foobar" #will not be captured by the block
block = o.create_something(v, v2)

In this case, only “v” and “v2” will be available to the block, through the usage of regular method arguments.

This way of defining blocks is a bit heavy weight, but absolutely necessary in some cases. It’s also the best way to get a blank slate binding, if you need that. Actually, to get a blank slate, you also need to remove all the Object methods from the “o” instance, and ActiveSupport have a library for blank slates. But this is the idea behind it.

It might seem stupid to care about memory at all in these days, but higher memory usage is one of the prices we pay for higher language abstractions. It’s wasteful to take it too far though.



Ruby memory leaks


They aren’t really common, but they do exist. As with any other garbage collected language, you can still be susceptible to memory leaks. In many cases they can also be very insidious. Say that you have a really large Rails application. After some time it grinds to a halt, CPU bound in GC. It may not even be a leak, it could just be something that creates so much garbage that the collector cannot take care of it.

I gotta admit, I’m not sure how to find such a problem. After getting histograms of objects, and trying to profile it, maybe run with ruby-debug, I would be out of options. Maybe some kind of shotgun technique – shutting down parts of the application, trying to pinpoint the location of the problem.

Now, ordinarily, that would have been the end of my search. A failure. Or maybe several weeks of trying to read through the sources.

The alternative? Run the application in JRuby. See if the same memory leak shows up (remember, it might be a bad interaction with MRI’s runtime system that gives you grief. Or maybe even a bug in MRI Garbage Collector). But if it doesn’t go away, you’re in luck. Wait until the CPU starts chugging for real, and then take a heap dump using the jmap Java SDK tool. Once that’s done, you’ll be sitting with a large honking binary file that you can’t do much with. The standard way of reading it is through jhat, but that don’t give much to go on.

But then I found this wonderful tool called SAP Memory Analyzer. Google it and download it. It’s marvelous. Easily the best heap analyzer I’ve run across in a long time. It’s only flaw is that it runs in Eclipse… But well, it can’t be everything, right?

Once you’ve opened up the file in SAP, you can do pretty much everything. It’s quite self explanatory. The way I usually go about things is to use the core option, and then choose “find_leak”. That almost always gives me some good suspects that I can continue investigating. From there on it’s just to drill down and find out exactly what’s going on.

Tell me if you can do that in any way as easy as that with MRI. I would love to know. But right now, JRuby is kicking butt in this regard.



An interesting memory leak in JRuby


The last two days I had lots of fun with the interesting task of finding a major memory leak in JRuby. The only way I could reliably reproduce it was by running Mingle’s test suite and see memory being eaten. I tried several approaches, the first being using jhat to analyze the heap dumps. That didn’t really help me much, since all the interesting queries I tried to run with OQL had a tendency to just cause out of memory errors. Not nice.

Next step was to install SAP Memory Analyzer which actually worked really well, even though it’s built on top of Eclipse. After several false starts, including one where I thought we had found the memory leak I finally got somewhere. Actually, I did find a memory leak in our method cache implementation. But alas, after fixing that it was obvious there was another leak in there.

I finally got SAP to tell me that RubyClasses was being retained. But when I tried to find the root chain to see how that happened I couldn’t see anything strange. In fact, what I saw what the normal chaining of frames, blocks, classes and other interesting parts. And this is really the problem when debugging this kind of problem in JRuby. Since a leak almost always be leaking several different objects, it can be hard to pinpoint the exact problem. In this case I guess that the problem was in a large branch that Bill merged a few weeks back, so I tried going back to it and checking. Alas, the branch was good. In fact, since I went back 200 revisions I finally knew within which range the problem had to be. Since I couldn’t find anything more from the heap dumps I resorted to the venerable tradition of binary search. Namely going through the revisions and finding the faulty one. According to log2, I would find the bad revision in less than 8 tries, so I started out.

After I while I actually found the problem. Let me show it to you here:

def __jtrap(*args, &block)
sig = args.first
sig = SIGNALS[sig] if sig.kind_of?(Fixnum)
sig = sig.to_s.sub(/^SIG(.+)/,'\1')
signal_class = Java::sun.misc.Signal
signal_class.send :attr_accessor, :prev_handler
signal_object = signal_class.new(sig) rescue nil
return unless signal_object
signal_handler = Java::sun.misc.SignalHandler.impl do
begin
block.call
rescue Exception => e
Thread.main.raise(e) rescue nil
ensure
# re-register the handler
signal_class.handle(signal_object, signal_handler)
end
end
signal_object.prev_handler = signal_class.handle(signal_object, signal_handler)
end

This is part of our signal handling code. Interestingly enough, I was nonplussed. How could trap leak? I mean, noone actually calls trap enough times to make it leak, right?

Well, wrong. Actually, it seems that ActiveRecord traps abort in transactions and then restore the original handler. So each transaction created new trap handlers. That would have been fine, except for the last line. In effect, in the current signal handler we save a reference to the previous signal handler. After a few iterations we will have a long chain of signal handlers, all pointing back, all holding a hard reference from one of the single static root sets in the JVM (namely, the list of all signal handlers). That isn’t so bad though. Except a saved block has references to dynamic scopes (which reference variables). It has a reference to the Frame, and the Frame has references to RubyClass. RubyClass has references to method objects, and method objects have in some cases references to RubyProcs, which in turn have more references to Blocks. At the end, we have a massive leak.

The solution? To simple remove saving of the previous handler and simplify the signal handler.