11 Apr 2008

Net::SSH::Multi + Rake == Tasty Potential

Posted by Jamis on Friday, April 11

Last night I released the first preview of Net::SSH::Multi (gem install --source http://gems.jamisbuck.org net-ssh-multi). Today, let me show you a tasty hint of what you can do with it.

Consider the following Rakefile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def remote
  @remote ||= begin
    require 'net/ssh/multi'

    session = Net::SSH::Multi.start

    session.via 'gateway.host', session.default_user

    session.group :web => session.use('web1', 'web2')
    session.group :app => session.use(*(1..8).map { |n| "app%02d" % n })
    session.group :db  => session.use('db1', :properties => { :primary => true })

    session
  end
end

namespace :remote do
  task :hostnames do
    remote.exec("hostname").wait
  end

  task :app_hostnames do
    remote.with(:app).exec("hostname").wait
  end

  task :web_hostnames do
    remote.with(:web).exec("hostname").wait
  end
end

You can now do things like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
$ rake remote:hostnames
(in /home/jamis)
[web1] web1.host
[app02] app02.host
[app05] app05.host
[app03] app03.host
[app06] app06.host
[app08] app08.host
[db1] db1.host
[app01] app01.host
[web2] web2.host
[app04] app04.host
[app07] app07.host

The Net::SSH::Multi library is still experimental, but it is stable and full-featured enough that I seriously considered implementing the next release of Capistrano on top of it. (I’ll probably put that off until cap3, though, due to the magnitude of the change.) If you do something cool with it, let me know!

Posted in Tips & Tricks | 5 comments

06 Apr 2007

Faking cursors in ActiveRecord

Posted by Jamis on Friday, April 6

There are times (like, in a migration, or a cron job) where I want to operate on large numbers of rows in the database, such as for billing, where you want to select all accounts who are due for automatic renewal, or when adding a new column to a table that you need to prepopulate with computed data.

One way to do that is just to brute force it:

1
2
3
Account.find(:all).each do |account|
  # ...
end

The drawback here is obvious: when you’re dealing with hundreds of thousands or even millions of rows, selecting them all into memory at once is brutal. And since ActiveRecord doesn’t support cursor-based operations, you can’t just ask ActiveRecord to return the rows as it reads them.

Here’s a trick I’ve been using recently to query large result sets while being friendly to the computer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class <<ActiveRecord::Base
  def each(limit=1000)
    rows = find(:all, :conditions => ["id > ?", 0], :limit => limit)
    while rows.any?
      rows.each { |record| yield record }
      rows = find(:all, :conditions => ["id > ?", rows.last.id], :limit => limit)
    end
    self
  end
end

Account.each do |account|
  # ...
end

Sadly, this won’t work on every DBMS, or with every query; it exploits several idiosyncrasies of MySQL which might not be present on other DBMSs:

  • MySQL sorts indexes.
  • The primary key is an index.
  • Queries which MySQL determines can be best satisfied by the primary key, then, will be returned in sorted order.

This means that if you try to add additional conditions to the query, you’ll also need to add an :order clause to sort by the id…and this will more than likely cause the performance of the query to go down the tubes. But for those queries where you just want to select every row anyway, it works quite well. You could use OFFSET and LIMIT, but OFFSET begins to be really, really slow when the OFFSET is in the tens of thousands or higher because it has to count through that many rows before finding where to begin returning data. Basing the query on id, like this, has the advantage of speed, because the database can use indexes like it was meant to.

Posted in Tips & Tricks | 20 comments

04 Apr 2007

ActiveRecord::Base#find shortcut

Posted by Jamis on Wednesday, April 4

I use ActiveRecord::Base#find a lot in the Rails console. A lot. As a result, I’ve started doing the following:

1
2
3
class <<ActiveRecord::Base
  alias_method :[], :find
end

That little snippet saves me up to five entire keystrokes, every time I need to do a find!

1
2
3
prs = Person[5]
prs = Person[:first]
prs = Person[:all, :conditions => { :name => "Jamis" }]

And, thanks to the existing hash and array semantics, it loses none of find’s readability for most cases. Good stuff!

Posted in Tips & Tricks | 25 comments

07 Mar 2007

Raising the right exception

Posted by Jamis on Wednesday, March 7

Ruby makes it very easy to raise exceptions:

1
2
3
4
def finagle_something
  raise "need a block" unless block_given?
  ...
end

Using raise like that (with just the exception message) is really handy for quick-and-dirty solutions. But for Serious Business Stuff™ you may often find the ambiguity of the default RuntimeError a little frustrating.

Ruby has a bunch of predefined exception classes that you can use. For the above, you’d do better to raise an ArgumentError, specifically (raise ArgumentError, "need a block"). It can pay big dividends to become familiar with the standard exception hierarchy in Ruby.

Sometimes, though, the standard exception classes just aren’t enough. For example, in Capistrano 2.0, I’d like any exception that Capistrano itself raises to be immediately recognizable as a Capistrano exception. The solution?

1
2
3
4
5
module Capistrano
  class Error < RuntimeError; end
  class ConnectionError < Error; end
  ...
end

Now, clients of Capistrano only need to look for Capistrano::Error to safely catch any exceptional conditions within Capistrano:

1
2
3
4
5
def use_capistrano
  ...
rescue Capistrano::Error => error
  warn "couldn't do a capistrano thing: #{error.message}"
end

Furthermore, the benefits of having more specific exception classes (like Capistrano::ConnectionError) are manifold; you can write code to easily detect and retry certain errors, or report some problems differently than others. When you start using specific exception classes, instead of the default RuntimeError, you’ll find you can handle your exceptions much more gracefully, and write much more robust programs.

Posted in Tips & Tricks | 8 comments

05 Mar 2007

Rendering empty responses

Posted by Jamis on Monday, March 5

Sometimes (and especially once you start dealing with writing web services) you’ll find yourself wanting to return an empty response, with only a status code and (possibly) a few headers set.

You can do this easily enough using the render method:

1
2
headers['Location'] = person_url(@person)
render :nothing => true, :status => "201 Created"

That, however, is unbearably verbose, especially when you find yourself needing to do it in multiple places.

Enter the head method:

1
head :created, :location => person_url(@person)

There, isn’t that beautiful?

Posted in Tips & Tricks | 12 comments

28 Feb 2007

Poor-man's pagination

Posted by Jamis on Wednesday, February 28

Here’s a really simple little tip, related to displaying paginated results. Using offset/limit, it’s pretty trivial to pull back just the page of data you want, as long as you know what the last offset/limit values were:

1
2
3
rows = Person.find(:all, :conditions => { ...},
  :limit => page_size, :offset => last_offset + page_size)
more_results = (last_offset + page_size + rows.length) < Person.count

However, it’d be nice to do this in a single query, especially since Person.count can get spendy if there are a lot of rows in the database. Here’s a simple way to do it:

1
2
3
rows = Person.find(:all, :conditions => { ...},
  :limit => page_size+1, :offset => last_offset + page_size)
more_results, rows = rows.length > page_size, rows[0,page_size]

You query the database for one more row than you actually want (page_size+1). If you get that many rows back, then you know there is at least one more page of data after the current page.

Posted in Tips & Tricks | 10 comments

26 Feb 2007

Dereferencing fixtures

Posted by Jamis on Monday, February 26

I frequently find myself writing a lot of helper methods for my tests. (I think I picked up that habit from Marcel Molina, Jr., actually.) These helper methods encapsulate tasks that I wind up doing all over the place, things like logging in a user, or uploading a file.

However, sometimes I want to pass in a symbol identifying the fixture that should be used in the helper, and sometimes I want to pass a record in directly. To accommodate this, I’ve been using the following idiom in my helpers:

1
2
3
4
def login!(user)
  user = users(user) if Symbol === user
  ...
end

It just checks the argument, and if it is a Symbol, it dereferences it using the “users” method provided by the fixtures.

To make this idiom reusable, I’ve got this little gem in my test_helper.rb file:

1
2
3
4
5
6
7
def dereference(argument, collection)
  if Symbol === argument
    return send(collection, argument)
  else
    return argument
  end
end

The login! method would then be written like this:

1
2
3
4
def login!(user)
  user = dereference(user, :users)
  ...
end

It’s a little thing, but it sure makes those test helpers a lot more flexible!

Posted in Tips & Tricks | 7 comments

19 Feb 2007

Route#to_s

Posted by Jamis on Monday, February 19

Here’s a nifty trick. Route#to_s exists. Why is this cool?

1
2
3
ActionController::Routing::Routes.routes.each do |r|
  puts r
end

This will give you a list of all of the routes you have defined, in a very human-consumable format. It’s great if you are trying to figure out why Rails is having problems accepting a URL that you think it should be accepting.

Similarly, you can list all of your named routes:

1
2
3
ActionController::Routing::Routes.named_routes.routes.each do |name, route|
  puts "%20s: %s" % [name, route]
end

This is especially handy if you are using map.resources, where there are lots of named routes being generated for you behind the scenes.

Posted in Tips & Tricks | 11 comments

08 Feb 2007

begin + else

Posted by Jamis on Thursday, February 8

Even after 6+ years, Ruby still continues to surprise and delight me. My latest discovery (thanks to Mauricio’s Happy 2007 challenge) is that begin/end blocks accept (in addition to rescue and ensure) an else clause:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
begin
  # main code here
rescue SomeException
  # ...
rescue AnotherException
  # ..
else
  # stuff you want to happen AFTER the main code,
  # but BEFORE the ensure block, but only if there
  # were no exceptions raised. Note, too, that
  # exceptions raised here won't be rescued by the
  # rescue clauses above.
ensure
  # stuff that should happen dead last, and
  # regardless of whether any exceptions were
  # raised or not.
end

If you don’t have an ensure clause, else is pretty much the same as just putting code immediately after the end, but if the order matters (something should happen before ensure, and only if the main code succeeded, and should not be subject to being rescued if something goes wrong), then else is your man.

Posted in Tips & Tricks | 14 comments

07 Feb 2007

Infinity

Posted by Jamis on Wednesday, February 7

This is hardly an original trick (it’s been mentioned many times before, on countless other blogs) but it is useful enough that it deserves mention yet again.

Ruby won’t let you divide an integer by zero—you’ll get an exception. However, thanks to the IEEE 754 standard for floating point numbers, when you try to divide a float by zero you get a rather special value back:

1
2
puts 1.0/0
#-> Infinity

It’s not a constant though, it’s just how that floating point result is represented as a string. However, you can easily assign that value to a constant:

1
Infinity = 1.0/0

Once you have that, you can use it for all kinds of nifty things; throw it in ranges, use it in comparisons, whatever suits your fancy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# a rather useless range
everything = -Infinity..Infinity
puts everything.include?(5) #-> true

# use it for representing an unbounded value
storage_limits => { :demo => 0,
  :standard => 250.megabytes,
  :expert => 1.gigabyte,
  :unlimited => Infinity }

if bytes_used < storage_limits[account_level]
  # add another file or something
else
  # display "out of space" message
end

Like I said earlier, it’s old news, but no less handy for that.

Posted in Tips & Tricks | 9 comments

06 Feb 2007

Overriding attributes in ActiveRecord

Posted by Jamis on Tuesday, February 6

I love that super calls method_missing if the method is not defined on the superclass.

Consider this case. You have some ActiveRecord named Account, which has an associated email_address. However, an account owner may optionally give a special “notification” email address, which will be used for things like newsletter emails and security issues and such. If no notification address has been explicitly given, it should fall back to the account’s primary email address. It’s as simple as this:

1
2
3
4
5
class Account < ActiveRecord::Base
  def notification_address
    super || email_address
  end
end

Calling super forces the superclass, ActiveRecord::Base, to be sent the notification_address message, which it won’t understand. This causes method_missing to be called on AR::Base, which looks for the notification_address attribute in the record’s attribute set. If that has not been set, it will be nil, in which case we then default to the email_address value.

Just as you’d expect.

Posted in Tips & Tricks | 17 comments

05 Feb 2007

Nesting resources

Posted by Jamis on Monday, February 5

The RESTful routes feature in Rails makes it really, really simple to nest resources within each other. Just give a block to the “map.resources” call, and define further resources on the value yielded to that block:

1
2
3
4
5
6
7
map.resources :accounts do |accounts|
  accounts.resources :people do |people|
    people.resources :notes do |notes|
      notes.resources :comments
    end
  end
end

That monstrosity would allow you to define routes like:

1
2
3
4
5
6
7
8
accounts_url         #-> /accounts
account_url(1)       #-> /accounts/1
people_url(1)        #-> /accounts/1/people
person_url(1,2)      #-> /accounts/1/people/2
notes_url(1,2)       #-> /accounts/1/people/2/notes
note_url(1,2,3)      #-> /accounts/1/people/2/notes/3
comments_url(1,2,3)  #-> /accounts/1/people/2/notes/3/comments
comment_url(1,2,3,4) #-> /accounts/1/people/2/notes/3/comments/4

Simple! However, in using RESTful routes more and more, I’m coming to realize that this is not a best practice. Rule of thumb: resources should never be nested more than 1 level deep. A collection may need to be scoped by its parent, but a specific member can always be accessed directly by an id, and shouldn’t need scoping (unless the id is not unique, for some reason).

Think about it. If you only want to view a specific comment, you shouldn’t have to specify the account, person, and note for the comment in the URL. (Permission concerns can come into this, to some degree, but even then I’d argue that judicious use of the session is better than complicating your URLs.) However, if you want to view all comments for a particular note, then you do need to scope the request by that note. Given the above nesting of routes, I’m finding the following a better (if slightly more verbose) method:

1
2
3
4
5
6
7
8
9
10
11
12
13
map.resources :accounts do |accounts|
  accounts.resources :people, :name_prefix => "account_"
end

map.resources :people do |people|
  people.resources :notes, :name_prefix => "person_"
end

map.resources :notes do |notes|
  notes.resources :comments, :name_prefix => "note_"
end

map.resources :comments

You’ll notice that I define each resource (except accounts) twice: once at the top level, and once nested within another resource. For the nested resources, I also give a “name_prefix”—this gets tacked onto the front of the named routes that are generated.

So, the above mappings give you the following named routes:

1
2
3
4
5
6
7
8
accounts_url          #-> /accounts
account_url(1)        #-> /accounts/1
account_people_url(1) #-> /accounts/1/people
person_url(2)         #-> /people/2
person_notes_url(2)   #-> /people/2/notes
note_url(3)           #-> /notes/3
note_comments_url(3)  #-> /notes/3/comments
comment_url(4)        #-> /comments/4

The URL’s are shorter, and the parameters to the named routes are much simpler. It’s an all-around win! I won’t go so far as to say that resources should never be deeply nested, but I will say that you should think long and hard before you go that route.

Posted in Tips & Tricks | 22 comments

01 Feb 2007

Per-developer configuration

Posted by Jamis on Thursday, February 1

Here’s another trick you can do with your environment.rb file. It’s of dubious value, but if you’re working on a team with conflicting opinions, it might be a handy way to set your own defaults without tyrannically checking them into the team’s source repository.

Just throw the following bit into the bottom of your environment.rb file:

1
2
3
4
if RAILS_ENV != "production" 
  railsrc = "#{ENV['HOME']}/.railsrc" 
  load(railsrc) if File.exist?(railsrc)
end

Then, if you put a file named ”.railsrc” in your home directory, your application will look for and load it every time the environment.rb file is loaded (unless you’re in production mode). Your .railsrc might look something like this:

1
2
ActiveRecord::Base.colorize_logging = false
# other stuff your apps configure

What other stuff would you put in a per-developer configuration file?

Posted in Tips & Tricks | 12 comments

31 Jan 2007

More on watching ActiveRecord

Posted by Jamis on Wednesday, January 31

Remember Watching ActiveRecord Do Its Thing, where I talked about redirecting the log to STDOUT when using the console? I’ve got a new trick based on this that I’ve found quite helpful. Simply put the following snippet in your config/environment.rb:

1
2
3
4
def log_to(stream)
  ActiveRecord::Base.logger = Logger.new(stream)
  ActiveRecord::Base.clear_active_connections!
end

Now, when you’re at the console, you can just do:

1
2
3
4
5
6
>> log_to STDOUT
=> ...
>> Post.find(:first)
  Post Load (0.000138)   SELECT * FROM posts LIMIT 1
=> #<Post:0x1234 ...>
>>

The best part is, by clearing the active connections after setting the logger, you can change the logger at any time, even after you’ve made any number of find calls.

And, you can pass your own stream objects into it:

1
2
3
4
5
6
7
8
9
>> buffer = StringIO.new
=> ...
>> log_to buffer
=> ...
>> Post.find(:first)
=> #<Post:0x1234 ...>
>> p buffer.string
=> "  \e[4;35;1mPost Load (0.000138)\e[0m   \e[0mSELECT * FROM posts LIMIT 1\e[0m\n"
>>

Why would you want to do this? Well, for one thing, you can use log_to in your tests, and make sure that sensitive things like credit card numbers aren’t being written to your logs. Or, you can use this in tests to make sure that your latest optimization really does reduce the number of queries made to the database.

Good fun!

Posted in Tips & Tricks | 8 comments

30 Jan 2007

Unit vs. Functional vs. Integration

Posted by Jamis on Tuesday, January 30

Unit tests. Functional tests. Integration tests. Rails draws a lot of circles around your tests, and it does a good job (in general) of helping you know what kinds of tests belong in each, but there are still some gray areas (and areas that I think it categorizes incorrectly).

For example: when do you use a functional test, and when do you use an integration test? Googling will point you at a variety different opinions, but here’s my take on it.

Unit tests are for testing models and pseudo-models. Basically, they are the simplest of your tests, exercising very specific functionality. Rails also throws your ActionMailer tests in this group, but I think that’s wrong. ActionMailer objects are more like controllers than like models, so I generally move my mailer tests to the functionals.

Functional tests bypass a lot of the start-up processes of Rails: they don’t try to recognize any routes, they ignore your instructions regarding your sessions, and they don’t do any request parsing. They depend heavily on the TestRequest and TestResponse classes, which stub out much of the basic functionality of the request and response objects.

As a result, functional tests are fast (since they skip so much initialization), and they are excellent at testing the meat of your controllers. However, because they require you to explicitly instantiate the controller you want to test (take a look at the setup method that Rails generates for you), they are harder to use in cross-controller scenarios. Also, if you want to make sure your routes are processed by the correct controllers and actions, functional tests don’t make that very easy, either.

Integration tests, on the other hand, test the entire Rails stack. Each request in an integration test mimics a real web request and exercises routing recognition, actually parses incoming requests, uses real sessions, and so forth. As a result, integration tests are significantly slower than functional tests, but they are excellent at testing cross-controller stories. Want to make sure the flash you set in the “create” action is being properly displayed in the “index” action? Sounds like you need an integration test. You can even use integration tests to exercise entire stories: “user logs in, views the catalog, views a product, adds it to their cart, checks out, enters credit card, submits payment, sees invoice.”

Integration tests are also good for grouping a bunch of related tests that cut across controllers, like permissions and access control. Even though each individual test might only test a single controller, each one is testing a different controller, and rather than have all your access control tests spread across several files in the “functional” directory, it is more convenient (and maintainable) to group them into a single integration test suite.

Naturally, your application may have some classes that don’t fit cleanly into any of the above three categories. What about a service that runs via cron? What about code that processes incoming emails? As a rule of thumb, if your test focuses on very specific functionality and tests only a single model, put it in the “unit” directory. If it tests something that depends on your models (like a controller, mailer, or other service), put it in the “functional” directory. And if you are testing something that cuts across multiple controllers or services, or if you want to aggregate tests across multiple controllers, then those belong in an integration test.

Posted in Tips & Tricks | 17 comments