Inspecting a live Ruby process

Posted by Jamis on September 22, 2006 @ 08:43 AM

So, there are you. Logged into one of your production machines, staring at a rogue Ruby process, and wondering why it has been running away with 90% of the CPU for the last half hour.

Sure, you can kill it, but you’d really like to know why it is stuck there. This isn’t the first time you’ve noticed this problem, and you’re getting a little tired of manually patching things up. But how do you figure out where the process is stuck?

We were faced with this same issue not long ago, periodically noticing a Backpack process hanging there in midair, sucking up as much CPU as it could. I finally took an hour and learned just enough GDB to eke a Ruby stack-trace from a running Ruby process.

Here’s what you do. First, get the process id, and attach GDB to that process:


sudo gdb /usr/local/bin/ruby <pid>

(Depending on your own setup, you may or may not need to use sudo.)

That should open up GDB, spit out a bunch of information, halt the Ruby process, and then tell you what C function the process was halted in:

1
2
3
4
Attaching to program: `/opt/local/bin/ruby', process 17090.
...
0x9001aafc in select ()
(gdb) 

In the above example, knowing that it was stuck in “select” is only marginally helpful. What was Ruby doing? That’s the question I want answered. To get that, we have to take advantage of a feature of GDB that lets you invoke C functions from the console. Essentially, we’re going to use the Ruby C API to get the answers we need:

1
2
3
4
5
6
7
(gdb) set $ary = (int)backtrace(-1)
(gdb) set $count = *($ary+8)
(gdb) set $index = 0
(gdb) while $index < $count
>  x/1s *((int)rb_ary_entry($ary, $index)+12)
>  set $index = $index + 1
>end

First, we call Ruby’s backtrace function to get a (ruby) array of strings. Then, we determine how many elements there are in the array, and loop over them in order. We call rb_ary_entry to return the elements at each index, and do some pointer arithmetic to get the actual char* pointer. We display that, increment the index, and go again. The result:

1
2
3
4
5
6
7
8
9
10
11
12
13
0x37c0790:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:36:in `readline'"
0x37380f0:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:36:in `run!'"
0x3745e60:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:35:in `run!'"
0x35ed2c0:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/shell.rb:15:in `run!'"
0x37a82d0:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/recipes/standard.rb:269:in `load'"
0x11712c0:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/actor.rb:159:in `shell'"
0x2564230:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:256:in `execute_recipes!'"
0x2f83750:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:256:in `execute_recipes!'"
0x2d1b170:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:233:in `execute!'"
0x2f438a0:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/lib/capistrano/cli.rb:12:in `execute!'"
0x2f43900:       "/opt/local/lib/ruby/gems/1.8/gems/capistrano-1.1.9.5085/bin/cap:11"
0x37a8340:       "/opt/local/bin/cap:18"
(gdb)

Well, there’s the stack-trace! For this (contrived) example, you can see that I simply had cap shell running, but this works just as well with a live FCGI process.

(Note: the above GDB script assumes a 32-bit architecture. For 64-bit architectures, simply substitute +16 for +8 and +24 for +12.)

One last tip. The stack-trace we got from the running Backpack process was almost enough to help us (or Sam, rather) to solve the problem. In addition to the stack-trace, we needed to know what the HTTP environment was that triggered the problem. I could try and do a bit more dumpster diving in the stack and heap of the Ruby process in question, but (in our case) there was an easier way to do it.

We use the exception notification plugin to track errors that arise in our apps. Fortunately, the emails include the HTTP environment for each request…all we needed, then, was a way to force an exception to be raised:


(gdb) call rb_raise(rb_eException, "raising an exception")

Isn’t that lovely? GDB even knows about the rb_eException constant, so I can simply reference it with the rb_raise function. The result? An exception that bubbles all the way to the top, and fires off an exception email.

In fact, all we really needed was that last trick, since the exception email includes the stack-trace, but getting the stack-trace can be handy for those situations where the exceptions aren’t emailed to you (like batch processes, perhaps).

Are there any GDB gurus out there that could share some other tasty tips? It’d be really nice, for instance, to inspect the current Ruby environment and glean things like environment variables and such, but my GDB-fu is not quite there.

Update: it looks like some versions of GDB (like on the Mac) require you to be more explicit about types. To make the rb_raise example work on those platforms, try the following variation:


(gdb) call (void)rb_raise((int)rb_eException, "raising an exception")

Posted in Tips & Tricks

Comments

Have something to add? Click here to leave a comment.

22 Sep 2006

1. Kent said...

This is very cool! I didn't know that gdb is so powerful.

2. _why said...

Well done, Jamis! Very adept. I wonder if this could be wrapped up into an extension. Does libgdb work?

3. Jamis said...

Now that'd be slick--a ruby interface to GDB. I'm not even certain what libgdb exposes, but it'd be a fun exercise. Any takers? Shall we see what we can come up with?

4. Thomas said...

Damn, that was enough to make me go download the manual for gdb. I have to give a big "hell, yeah!" to _why's suggestion about seeing if this could be wrapped up in an extension. I'm not even doing much ruby programming lately, but this would be very sweet to have for when I get back on track.

5. Jamis said...

Well, I (for one) am having some trouble locating libgdb anywhere...all the docs I can find for it are ancient (like 10+ years old). Anyone know if it is still alive, and if so, how to get it?

6. Dominic Mitchell said...

Wow -- that's really neat! In the past, when I've come up against similar troubles in my Perl app, I've resorted to installing a signal handler that spits out a stack trace to the error log.
$SIG{USR2} = sub { Carp::cluck("Got SIGUSR2 in $$") };
I've now ensured that's standard in all Perl apps I deploy. SIGUSR2 is a good choice, as it's not usually in use (unlike SIGUSR1, which Apache uses). I'm sure that writing something similar for ruby wouldn't be too hard. Of course, the bonus of using gdb against the live process is that it doesn't require any code changes up front.

7. _why said...

So, I started poking around with ddd, the graphical cover for gdb. It's a gnu tool itself. And it looks like it opens a gdb process and drives everything through the prompt. kdbg does the same. On the other hand, the tcl/tk insight seems to bundle a modified gdb with its distribution.

8. Joe Ruby said...

Good stuff! I get those cpu-hogging Ruby processes once in a while too.

9. Jamis said...

Thanks for the tip, Dominic. That's another useful technique. I've done the same in the past, but I didn't want to have to redeploy Backpack just to get some debugging info--that quickly turns into a "dang, I need more info, let me tweak and redeploy AGAIN" kind of cycle (at least for me). The gdb approach was nice because it didn't require any code changes to do.

10. Dominic Mitchell said...

Looking at environ(7), it's pretty easy to get out the environment as well. (gdb) set $index = 0
(gdb) while environ[$index]
>p environ[$index]
>set $index = $index + 1
>end
You'll always be surprised at the cruft that builds up in there. Hence, one of my more used shell scripts: exec env - \
LOGNAME="$LOGNAME" \
HOME="$HOME" \
PATH="/bin:/usr/bin:/usr/local/bin" \
SHELL="$SHELL" \
"$@"

11. Jamis said...

_why, if such venerable apps as ddd are using pipes to communicate with gdb, perhaps it's not such a bad strategy. We could do the same from ruby, no? I might try and play with that a bit tonight, unless someone beats me to it.

12. _why said...

Yeah, it looks like there are a few Perl modules that do this. The closest I can find to what we're looking for is Devel::GDB -- http://search.cpan.org/src/JEZRA/Devel-GDB-1.23/GDB.pm. Also, a Pythonista has done a similiar thing and shows how to wrap it in a gdb function. So, alternatively, we could come up with a set of gdb functions for Ruby. http://www.python.org/~jeremy/weblog/031003.html

13. Jamis said...

Well, I've got the following working--it's been a fun exercise:
1
2
3
4
5
6
7
8
gdb = GDB::Ruby.new(ARGV.shift)

gdb.backtrace.each do |line|
  puts line
end

gdb.raise Exception, "go boom!"
gdb.quit
My focus isn't a general interface for GDB, though--it's a way of interacting with Ruby processes via GDB, via Ruby. Anyway, as I said, a fun exercise. I'll play with it some more this weekend.
23 Sep 2006

14. mfp said...

Using the C API is way too much work :-) I have an eval command that allows

(gdb) eval "caller"
...
(gdb) rb_locals
a = 1
foo = "dsfsf"
...

Once you have #eval, you pretty much get everything else for free.

I blogged about this at eigenclass.org.

15. mfp said...

I forgot to say that #eval allows arbitrary Ruby expressions and tries to protect you from typos and other errors (so that you don't kill your process with a NameError/NoMethodError).

16. Jamis said...

Oh, that's brilliant! Thanks for sharing that. That's a lot cleaner than mucking about with the pointer offsets and stuff.

17. Nils said...

Don't quite sure if this tip exactly fits the bill, but you can do some nifty Objective-C introspection using nlist. http://unixjunkie.blogspot.com/2006/08/tracing-objective-c-messages.html And maybe something similar could be done for Ruby by using libffi ??? http://chopine.be/lrz/diary/2006-09-15_RubyCocoa-and-libffi.html