Buckblog: Introducing the Capistrano Shell

Introducing the Capistrano Shell

20 September 2006 — A new Capistrano extension is introduced, allowing commands to be run interactively and in parallel on multiple servers at once — 5-minute read

Our current cluster at 37signals consists of (at the moment) 12 machines. The first few weeks that we were running them live were rather bumpy, and I took to using Capistrano to do a bit of ad-hoc monitoring, using the uptime task (from the capistrano-ext gem) to keep an eye on things:

cap -v uptime

However, with 12 machines, the wait for that command to run each time was about 20 to 30 seconds, which is far too long for my impatient self.

Part of that seemingly-interminable wait was alleviated by changing Capistrano so that it connects to servers in parallel, rather than serially. Prior to 1.2, when you executed a task that needed to connect to multiple servers, each connection was established one at a time, in a single thread. Now, they are established in parallel, one per thread. The connection overhead, which used to be 18 seconds on my laptop, dropped to just over 6 seconds. Not a bad savings!

However, there was more that could be done. Why, when I am running this command every few minutes, should I need to reestablish the connection with every request? What if I could cache the connection somehow?

This line of thinking led to what I feel is the most exciting new feature in Capistrano 1.2: cap shell.

jamis> cap -v shell
====================================================================
Welcome to the interactive Capistrano shell! This is an experimental
feature, and is liable to change in future releases.
--------------------------------------------------------------------
cap> !uptime
[establishing connection(s) to db2, db1, db3, file1, app1, app2, app3, app4, app5, web1, web2, web3]
querying servers...
[app1 ] up   15 days, 7:40, load 3.53, 1.36, 0.83
[app2 ] up  15 days, 8 hrs, load 0.23, 0.38, 0.44
[app3 ] up  19 days, 11:13, load 0.29, 0.40, 0.43
[app4 ] up  19 days, 12:02, load 0.58, 0.49, 0.51
[app5 ] up   20 days, 9:50, load 0.50, 0.45, 0.43

[db2  ] up  28 days, 22:29, load 0.05, 0.06, 0.07

[file1] up  60 days, 12:42, load 0.06, 0.17, 0.17

[web1 ] up  19 days, 22:31, load 0.17, 0.12, 0.13
[web2 ] up  36 days, 22:18, load 0.06, 0.10, 0.11

[db1  ] up  28 days, 21:07, load 1.27, 0.90, 0.82
[db3  ] up  28 days, 22:53, load 1.19, 0.94, 0.85

[web3 ] up 188 days, 19:36, load 0.09, 0.06, 0.01
cap>

Subsequent invocations of the uptime task from within that shell will reuse the existing connections. The result? It now takes less than two seconds to give me my answers.

But the shell is good for more than just executing tasks. You can execute arbitrary commands and have them run on your servers:

cap> uname -srp
[db2] FreeBSD 6.1-STABLE amd64
[web2] FreeBSD 6.1-STABLE amd64
[web3] FreeBSD 6.0-RELEASE amd64
[file1] Linux 2.6.9-34.0.2.ELsmp x86_64
[app1] FreeBSD 6.0-RELEASE amd64
[app2] FreeBSD 6.1-STABLE amd64
[app3] FreeBSD 6.1-STABLE amd64
[app4] FreeBSD 6.1-STABLE amd64
[app5] FreeBSD 6.1-STABLE amd64
[web1] FreeBSD 6.1-STABLE amd64
[db1] FreeBSD 6.1-STABLE amd64
[db3] FreeBSD 6.1-STABLE amd64
cap>

(Note that prefixing the command with an exclamation point causes Capistrano to interpret it as the name of a task, and to execute it as such. Without the exclamation mark, it is considered a bare command, and is simply executed verbatim on each host.)

Also, you can focus the task or command so that it only executes on a specific role, or host:

cap> with app uname -srp
[app1] FreeBSD 6.0-RELEASE amd64
[app2] FreeBSD 6.1-STABLE amd64
[app3] FreeBSD 6.1-STABLE amd64
[app4] FreeBSD 6.1-STABLE amd64
[app5] FreeBSD 6.1-STABLE amd64
cap> on web1,web2 uname -srp
[web1] FreeBSD 6.1-STABLE amd64
[web2] FreeBSD 6.1-STABLE amd64
cap>

You can use with role to focus the command to all machines answering to the named role. Use a comma-delimited list to execute on machines in any of a list of roles.

If that is too general, you can get as specific as you wish using on host. This lets you execute only on the named host, or hosts. In fact, you can name hosts this way that aren’t even defined in your deploy.rb—it’ll establish connections on the fly to any machines it doesn’t recognize.

Lastly, you can use both with and on to set the scope for subsequent commands. Just leave the command off:

cap> with app
scoping with app
cap> uname -srp
[app1] FreeBSD 6.0-RELEASE amd64
[app2] FreeBSD 6.1-STABLE amd64
[app4] FreeBSD 6.1-STABLE amd64
[app3] FreeBSD 6.1-STABLE amd64
[app5] FreeBSD 6.1-STABLE amd64
cap> hostname -s
[app1] 82095-app1
[app4] 82098-app4
[app5] 82099-app5
[app2] 82096-app2
[app3] 82097-app3
cap> with all
scoping with all
cap> hostname -s
[db2] 82101-db2
[web3] 82094-web3
[app1] 82095-app1
[app2] 82096-app2
[db1] 82100-db1
[file1] 82103-file1
[app3] 82097-app3
[app4] 82098-app4
[app5] 82099-app5
[web1] 82092-web1
[web2] 82093-web2
[db3] 82102-db3
cap>

(Note the use of the special all keyword, to return the scope to all roles and machines.)

So, armed with this new tool, I was able to keep an eye on the load of each machine in the cluster using the uptime task. When I noticed something anomalous, I focused on the box in question and looked for rogue processes:

cap> on app3 ps waux | head -n 5
...
cap> on app3 sudo kill 12345
Password:
...
cap>

It worked great. It’s definitely not a substitute for a real SSH shell, but it’s perfect for quick-and-dirty tasks that require you to switch between hosts frequently.

Which begs the question: why isn’t it a substitute for a real SSH shell?

Well, firstly, cap shell is stateless. Each command is executed in a new shell on the remote host. This means that commands like cd and export are pretty useless, since they won’t stick.

Secondly, cap shell isn’t intended to deal with interactive commands. You can’t, for instance, run IRB on multiple hosts simultaneously using cap shell. It does manage to deal with commands like tail -f, but that’s about the limit.

Thirdly, if you thought rm -rf / was dangerous when connected to a single host, imagine the damage you could do with cap shell! This is probably one of the biggest reasons it is still experimental. Until I can find a way to make it less likely to accidentally wipe an entire cluster with a single command, you ought to go into using this with caution.

Still, even with all those caveats, cap shell has become an irreplaceable tool in my toolbox. I’d love to hear from you with ideas for how to make it better, and safer.

The Buckblog

assorted ramblings by Jamis Buck

Introducing the Capistrano Shell