Don't be afraid of harnessing SQL
Even after ten years of working with SQL, I still find myself tickled by how powerful it is, in spite of its warts.
In Basecamp, users can create to-do list “templates”. Each template is essentially just a name, an optional description, and a bunch of items. Once defined, users can create new to-do lists based on one of these templates.
We used to do this entirely via the ActiveRecord helper methods. First, we’d create a new list, and then creating the items for the list one at a time, for each item in the template. It looked something like this:
1 2 3 4 5 6 7 8 9 10 11 |
class TodoListTemplate < ActiveRecord::Base has_many :todo_item_templates def instantiate list = TodoList.create(:name => name, :description => description) todo_item_templates.each do |item| list.todo_items.create :content => item.content end list end end |
This worked, but was very inefficient. It results in a lot of SQL statements being sent down the pipe, mostly because we’ve got some before_create
hooks and observers set up that perform work for each new to-do item that is created. As our traffic grew, we started running into deadlock issues. All those hooks and observers, so convenient at the time, were now wreaking havoc on the database.
The problem was easily solved. First of all, a little thought helped me see that those hooks and observers were either not needed in this case, or could be done slightly differently. Secondly, instead of copying each item template to an item, one at a time, we could do it all in SQL, as a single statement. Here’s more or less how we rewrote it:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def instantiate list = TodoList.create(:name => name, :description => description) TodoItem.connection.insert <<-SQL, "Populating items" INSERT INTO todo_items (todo_list_id, content, position, created_at) SELECT #{list.id}, content, position, UTC_TIMESTAMP() FROM todo_item_templates WHERE todo_list_template_id = #{id} SQL list.todo_items.reset list end |
Basically, the INSERT
takes the associated SELECT
statement, and inserts the results of each returned row into the todo_items
table. Not only is this blazing fast, but it is much nicer to the database.
Once everything has been inserted, we call todo_items.reset
, to force the todo_items
association on the list to be unloaded, and then we return the list.
Your own situation may require more or less logic than this. You may even be completely fine doing everything via ActiveRecord. But if you find your application beginning to flounder in places where you are doing lots of database queries, consider rethinking those areas to consolidate some of that work.
Don’t be afraid of harnessing SQL.
Reader Comments
What is the meaning/usage of that “Populating items” statement before the SQL?
7 Nov 2006
Victor, the second parameter to
insert
(as well as the other methods on the connection, likeexecute
,select
,update
, anddelete
) is a string that will be used to prefix the query in the log. In other words, when that query shows up in the logs, it will say “Populating items: [sql here]”.7 Nov 2006
Hi Jamis, I always felt a bit dirty doing it like that but you’ve made it look made ok. Pfew thankfully i’m not throwing away 10 years of writing (pl/)sql for a living :-) Great post.
7 Nov 2006
I guess if there’s absolutely no other way, but I don’t muck with SQL much at all any more since it totally bypasses any before/after/validate/etc. methods I may have in place.
7 Nov 2006
Joe, obviously, I’m not recommending a total wholesale move towards raw SQL. :) Use ActiveRecord as much as possible. It makes life easy for you, and that’s the entire point. But there may come a time, at some point in your application’s lifecycle, and for some part of your application, where ActiveRecord is doing things too inefficiently. When that happens, consider reworking things in straight SQL.
7 Nov 2006
I’ll bet there’s a way to do this more efficiently in AR. ;)
7 Nov 2006
Good post. I think it’s important to point out, though, that as soon as you start doing this, you need to be very careful with the strings you use within your SQL statement – specifically, none of them should come from an untrusted source (like the user) without being scrubbed, first.
7 Nov 2006
I think you’re spot on. Sometimes SQL is the only reasonable answer. I most use it on specialized finders that need to join multiple tables but would be a royal pain in Ruby. The best thing about it is, if your SQL is well-formed, you don’t have to worry too much about optimization because most DBMS have built-in query optimizers. Plus, it’s often more clear and concise than long, drawn out conglomerations of .select, .collect and their brethren.
8 Nov 2006
Try Rails sometime—:include makes totally short work of joins.
8 Nov 2006
For the record, I just spent this morning working with a friend, trying resolve some real performance problems with :include (together with :order and :limit). Yah, eager loading is nice, but it’s not a panacea, and anyone who thinks it is will shoot themselves in the foot (or the face) sooner or later. Yes, use what rails gives you for as long as you can. 90% of the time (or more), it is sufficient for what you need. For the rest, though, don’t be afraid to drop to the bare metal and make things performant.
For those who weren’t aware, Rails isn’t about making 100% of your tasks easier. It’s actually only closer to 80%. The really common things are really easy. The less common things, aren’t as easy. And the rare things, well, those you have to do manually more often than not.
It’s a feature of rails. An intentional design decision. If you’re fortunate enough to have your apps all fall in that 80% category, good for you. That’s wonderful. But don’t expect that.
8 Nov 2006
FYI: (http://saizai.livejournal.com/741866.html for longer version of this)
I’ve got a table that has 6 many-to-many associations. My paginator loads them and gets these times (for 0 through 6 tables in the :include):
SQL query: 0.001, 0.004, 0.012, 0.12, 0.17, 0.8, 3.8 pagination proper: 0.091, 0.171, 0.168, 0.48, 1.2, 4, 5, 26.2
While I’ve nerfed it for now the prereq to unnerfing it will probably be to denormalize those tables some. Fortunately I don’t yet need to resort to pure SQL inserts.
And of course many thanks to Jamis for totally helping me figure that out. :-)
8 Nov 2006
Using native SQL is of course always the fastest way to manipulate data. However, it comes with certain “intricacies” which abstraction layers try to remove or at the very least reduce.
I agree wholeheartedly though that if you’re very sure of what you want to do and very sure with the database structure and relationships, direct SQL statements is the way to go.
Thanks for the pointers.
8 Nov 2006
Is there any way you could programatically generate the expression, rather than duplicating field names in the SQL string? That way changes to database schema will be automatically picked up.
9 Nov 2006
I don’t think Joe would agree that this is a good idea.
9 Nov 2006
Iain, you could, but the thing is, if the schema changes, how will you know which columns are relevant here? If you change the name of, for instance, the ‘content’ column, or remove it, you’ll probably have to change more than just this one query to get your app to work with it.
9 Nov 2006
I recently worked on a system where performance of a few key queries also required that I drop into SQL statements. To help make the code a bit easier to work with we wrote a simple SQL generating Ruby library. I found it to be better than here docs with dynamic pieces. It may or may not be helpful to you when you are doing something similar.
http://sqldsl.rubyforge.org
11 Nov 2006