Inspecting a live Ruby process
So, there are you. Logged into one of your production machines, staring at a rogue Ruby process, and wondering why it has been running away with 90% of the CPU for the last half hour.
Sure, you can kill it, but you’d really like to know why it is stuck there. This isn’t the first time you’ve noticed this problem, and you’re getting a little tired of manually patching things up. But how do you figure out where the process is stuck?
We were faced with this same issue not long ago, periodically noticing a Backpack process hanging there in midair, sucking up as much CPU as it could. I finally took an hour and learned just enough GDB to eke a Ruby stack-trace from a running Ruby process.
Here’s what you do. First, get the process id, and attach GDB to that process:
sudo gdb /usr/local/bin/ruby <pid> |
(Depending on your own setup, you may or may not need to use sudo
.)
That should open up GDB, spit out a bunch of information, halt the Ruby process, and then tell you what C function the process was halted in:
1 2 3 4 |
Attaching to program: `/opt/local/bin/ruby', process 17090. ... 0x9001aafc in select () (gdb) |
In the above example, knowing that it was stuck in “select” is only marginally helpful. What was Ruby doing? That’s the question I want answered. To get that, we have to take advantage of a feature of GDB that lets you invoke C functions from the console. Essentially, we’re going to use the Ruby C API to get the answers we need:
1 2 3 4 5 6 7 |
(gdb) set $ary = (int)backtrace(-1) (gdb) set $count = *($ary+8) (gdb) set $index = 0 (gdb) while $index < $count > x/1s *((int)rb_ary_entry($ary, $index)+12) > set $index = $index + 1 >end |
First, we call Ruby’s backtrace
function to get a (ruby) array of strings. Then, we determine how many elements there are in the array, and loop over them in order. We call rb_ary_entry
to return the elements at each index, and do some pointer arithmetic to get the actual char*
pointer. We display that, increment the index, and go again. The result:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
0x37c0790: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:36:in `readline'" 0x37380f0: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:36:in `run!'" 0x3745e60: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:35:in `run!'" 0x35ed2c0: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:15:in `run!'" 0x37a82d0: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/recipes/standard.rb:269:in `load'" 0x11712c0: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/actor.rb:159:in `shell'" 0x2564230: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:256:in `execute_recipes!'" 0x2f83750: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:256:in `execute_recipes!'" 0x2d1b170: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:233:in `execute!'" 0x2f438a0: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:12:in `execute!'" 0x2f43900: "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/bin/cap:11" 0x37a8340: "/opt/local/bin/cap:18" (gdb) |
Well, there’s the stack-trace! For this (contrived) example, you can see that I simply had cap shell running, but this works just as well with a live FCGI process.
(Note: the above GDB script assumes a 32-bit architecture. For 64-bit architectures, simply substitute +16 for +8 and +24 for +12.)
One last tip. The stack-trace we got from the running Backpack process was almost enough to help us (or Sam, rather) to solve the problem. In addition to the stack-trace, we needed to know what the HTTP environment was that triggered the problem. I could try and do a bit more dumpster diving in the stack and heap of the Ruby process in question, but (in our case) there was an easier way to do it.
We use the exception notification plugin to track errors that arise in our apps. Fortunately, the emails include the HTTP environment for each request…all we needed, then, was a way to force an exception to be raised:
(gdb) call rb_raise(rb_eException, "raising an exception") |
Isn’t that lovely? GDB even knows about the rb_eException
constant, so I can simply reference it with the rb_raise
function. The result? An exception that bubbles all the way to the top, and fires off an exception email.
In fact, all we really needed was that last trick, since the exception email includes the stack-trace, but getting the stack-trace can be handy for those situations where the exceptions aren’t emailed to you (like batch processes, perhaps).
Are there any GDB gurus out there that could share some other tasty tips? It’d be really nice, for instance, to inspect the current Ruby environment and glean things like environment variables and such, but my GDB-fu is not quite there.
Update: it looks like some versions of GDB (like on the Mac) require you to be more explicit about types. To make the rb_raise
example work on those platforms, try the following variation:
(gdb) call (void)rb_raise((int)rb_eException, "raising an exception") |
Reader Comments
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
(gdb) while environ[$index]
>p environ[$index]
>set $index = $index + 1
>end You'll always be surprised at the cruft that builds up in there. Hence, one of my more used shell scripts: exec env - \
LOGNAME="$LOGNAME" \
HOME="$HOME" \
PATH="/bin:/usr/bin:/usr/local/bin" \
SHELL="$SHELL" \
"$@"
22 Sep 2006
22 Sep 2006
22 Sep 2006
22 Sep 2006
I blogged about this at eigenclass.org.
23 Sep 2006
23 Sep 2006
23 Sep 2006
23 Sep 2006