Don't be afraid of harnessing SQL

Posted by Jamis on November 07, 2006 @ 01:55 PM

Even after ten years of working with SQL, I still find myself tickled by how powerful it is, in spite of its warts.

In Basecamp, users can create to-do list “templates”. Each template is essentially just a name, an optional description, and a bunch of items. Once defined, users can create new to-do lists based on one of these templates.

We used to do this entirely via the ActiveRecord helper methods. First, we’d create a new list, and then creating the items for the list one at a time, for each item in the template. It looked something like this:

1
2
3
4
5
6
7
8
9
10
11
class TodoListTemplate < ActiveRecord::Base
  has_many :todo_item_templates

  def instantiate
    list = TodoList.create(:name => name, :description => description)
    todo_item_templates.each do |item|
      list.todo_items.create :content => item.content
    end
    list
  end
end

This worked, but was very inefficient. It results in a lot of SQL statements being sent down the pipe, mostly because we’ve got some before_create hooks and observers set up that perform work for each new to-do item that is created. As our traffic grew, we started running into deadlock issues. All those hooks and observers, so convenient at the time, were now wreaking havoc on the database.

The problem was easily solved. First of all, a little thought helped me see that those hooks and observers were either not needed in this case, or could be done slightly differently. Secondly, instead of copying each item template to an item, one at a time, we could do it all in SQL, as a single statement. Here’s more or less how we rewrote it:

1
2
3
4
5
6
7
8
9
10
11
12
13
def instantiate
  list = TodoList.create(:name => name, :description => description)

  TodoItem.connection.insert <<-SQL, "Populating items"
    INSERT INTO todo_items (todo_list_id, content, position, created_at)
      SELECT #{list.id}, content, position, UTC_TIMESTAMP()
        FROM todo_item_templates
       WHERE todo_list_template_id = #{id}
  SQL

  list.todo_items.reset
  list
end

Basically, the INSERT takes the associated SELECT statement, and inserts the results of each returned row into the todo_items table. Not only is this blazing fast, but it is much nicer to the database.

Once everything has been inserted, we call todo_items.reset, to force the todo_items association on the list to be unloaded, and then we return the list.

Your own situation may require more or less logic than this. You may even be completely fine doing everything via ActiveRecord. But if you find your application beginning to flounder in places where you are doing lots of database queries, consider rethinking those areas to consolidate some of that work.

Don’t be afraid of harnessing SQL.

I’ll probably begin publishing these kinds of “best practices” articles to The Rails Way, instead of to this blog. If you want to follow along, be sure and subscribe to that feed, too.

Posted in Essays and Rants

Comments

Have something to add? Click here to leave a comment.

07 Nov 2006

1. Victor Grey said...

What is the meaning/usage of that “Populating items” statement before the SQL?

2. Jamis said...

Victor, the second parameter to insert (as well as the other methods on the connection, like execute, select, update, and delete) is a string that will be used to prefix the query in the log. In other words, when that query shows up in the logs, it will say “Populating items: [sql here]”.

3. rodney said...

Hi Jamis, I always felt a bit dirty doing it like that but you’ve made it look made ok. Pfew thankfully i’m not throwing away 10 years of writing (pl/)sql for a living :-) Great post.

4. Joe Ruby said...

I guess if there’s absolutely no other way, but I don’t muck with SQL much at all any more since it totally bypasses any before/after/validate/etc. methods I may have in place.

5. Jamis said...

Joe, obviously, I’m not recommending a total wholesale move towards raw SQL. :) Use ActiveRecord as much as possible. It makes life easy for you, and that’s the entire point. But there may come a time, at some point in your application’s lifecycle, and for some part of your application, where ActiveRecord is doing things too inefficiently. When that happens, consider reworking things in straight SQL.

6. Joe Ruby said...

I’ll bet there’s a way to do this more efficiently in AR. ;)

7. david Altenburg said...

Good post. I think it’s important to point out, though, that as soon as you start doing this, you need to be very careful with the strings you use within your SQL statement – specifically, none of them should come from an untrusted source (like the user) without being scrubbed, first.

08 Nov 2006

8. Sean Cribbs said...

I think you’re spot on. Sometimes SQL is the only reasonable answer. I most use it on specialized finders that need to join multiple tables but would be a royal pain in Ruby. The best thing about it is, if your SQL is well-formed, you don’t have to worry too much about optimization because most DBMS have built-in query optimizers. Plus, it’s often more clear and concise than long, drawn out conglomerations of .select, .collect and their brethren.

9. Joe Ruby said...

I most use it on specialized finders that need to join multiple tables but would be a royal pain in Ruby.

Try Rails sometime—:include makes totally short work of joins.

10. Jamis said...

For the record, I just spent this morning working with a friend, trying resolve some real performance problems with :include (together with :order and :limit). Yah, eager loading is nice, but it’s not a panacea, and anyone who thinks it is will shoot themselves in the foot (or the face) sooner or later. Yes, use what rails gives you for as long as you can. 90% of the time (or more), it is sufficient for what you need. For the rest, though, don’t be afraid to drop to the bare metal and make things performant.

For those who weren’t aware, Rails isn’t about making 100% of your tasks easier. It’s actually only closer to 80%. The really common things are really easy. The less common things, aren’t as easy. And the rare things, well, those you have to do manually more often than not.

It’s a feature of rails. An intentional design decision. If you’re fortunate enough to have your apps all fall in that 80% category, good for you. That’s wonderful. But don’t expect that.

11. Sai Emrys said...

FYI: (http://saizai.livejournal.com/741866.html for longer version of this)

I’ve got a table that has 6 many-to-many associations. My paginator loads them and gets these times (for 0 through 6 tables in the :include):

SQL query: 0.001, 0.004, 0.012, 0.12, 0.17, 0.8, 3.8 pagination proper: 0.091, 0.171, 0.168, 0.48, 1.2, 4, 5, 26.2

While I’ve nerfed it for now the prereq to unnerfing it will probably be to denormalize those tables some. Fortunately I don’t yet need to resort to pure SQL inserts.

And of course many thanks to Jamis for totally helping me figure that out. :-)

12. Azmeen said...

Using native SQL is of course always the fastest way to manipulate data. However, it comes with certain “intricacies” which abstraction layers try to remove or at the very least reduce.

I agree wholeheartedly though that if you’re very sure of what you want to do and very sure with the database structure and relationships, direct SQL statements is the way to go.

Thanks for the pointers.

09 Nov 2006

13. Iain Buckingham said...

Is there any way you could programatically generate the expression, rather than duplicating field names in the SQL string? That way changes to database schema will be automatically picked up.

14. Graham said...

I don’t think Joe would agree that this is a good idea.

15. Jamis said...

Iain, you could, but the thing is, if the schema changes, how will you know which columns are relevant here? If you change the name of, for instance, the ‘content’ column, or remove it, you’ll probably have to change more than just this one query to get your app to work with it.

11 Nov 2006

16. Jay said...

I recently worked on a system where performance of a few key queries also required that I drop into SQL statements. To help make the code a bit easier to work with we wrote a simple SQL generating Ruby library. I found it to be better than here docs with dynamic pieces. It may or may not be helpful to you when you are doing something similar.

http://sqldsl.rubyforge.org