Indexing for DB performance
Isn’t Rails great? It makes interacting with your database so easy, and removes almost every vestige of SQL from the development process. You can build and mutate your entire database schema (thanks to ActiveRecord::Migration and ActiveRecord::Schema), go crazy shoving data into your database (with
ActiveRecord::Base.create and friends) and query your data in a very friendly Ruby DSL (
Wonderful! But I think most of us have experienced the puzzlement and frustration of wondering why our application, which ran so beautifully during testing and for the first few days or weeks after launch, is suddenly running slower and slower, and why our database is being so incredibly overworked. What happened?
Chances are, you forgot to add indexes to your tables. Rails won’t (and, honestly, can’t) do it for you. In fact, Rails doesn’t even try to tell you where those indexes might be needed. And without those indexes, the only recourse the database has when fulfilling your query is to do a “full table scan”, basically looking at each row in the table, one at a time, to find all matching records. That’s not too bad when there are only a few tens (or even thousands, on a fast machine) of rows, but when you starting getting tens of thousands, hundreds of thousands, or even millions of rows, just imagine how hard your database has to work to satisfy those queries!
So you may be wondering, “alright, I need indexes…how do I know what indexes to create?”
Here are a few general tips. My experience is primarily with MySQL, so that’s where my advice is directed, but I believe most of these tips apply regardless of your DBMS:
- If you have a foreign key on a table (or, phrased another way, you have a
has_and_belongs_to_manyassociation on a model), then you almost certainly need to add an index for it, because any time you access those associations, Rails is generating SQL under the covers that queries based on those foreign keys.
- If you find yourself frequently doing queries on a non-foreign-key column (like
permalink), you’ll definitely want an index on that column.
- If you frequently sort on a column or combination of columns, make sure the index that is being used for the query includes those sort columns, too (if at all possible). Indexes store the data in sorted order, so if your index includes the sort column, the database can return the sorted data at almost no extra cost.
- Many databases (like MySQL, or Postgres prior to 8.1) will only use a single index per table, per query, so make sure you have indexes defined for the column combinations that you will query on frequently. A common mistake is to define an index on “user_name” and an index on “account_id”, and then expect the database to use both indexes to satisfy a query that references both columns. (Some databases will use both indexes, though; be sure and understand how your DBMS uses indexes.)
- Don’t go crazy defining indexes. It is tempting to just add an index on every column that could conceivably be queried on, just to preemptively destroy any possible DB performance problems that may arise. This is bad. Too many indexes can be just as bad as too few, since the DB has to try and determine which of the myriad indexes to use to satisfy a particular query. Also, indexes consume disk space, and they have to be kept in sync every time an insert, delete, or update statement is executed. Lots of indexes means lots of overhead, so try to strike a good balance. Start with only the indexes you absolutely need, and try to use only those. As problem queries surface, see if they can be rewritten to use existing indexes, and only if they can’t should you go ahead and add indexes to fix them.
ANALYZE(Postgres) (or whatever means your DB provides) are your best friends. Get to know them. Learn how to read their output. They will tell you what indexes (if any) a query will use, and how the database expects to be able to fulfil the query. It is a good idea to play with these commands during testing, to try and locate problem spots before they become problems. Note, though, that the number of rows in a table can affect how the database chooses indexes, so just because your query looks fine with only a handful of test rows in the database, don’t expect it to perform well when there are thousands of rows. In a perfect world, you could test your app with a large corpus of real data. In an imperfect world, you just have to make do.
In short, know your database. As convenient as ActiveRecord makes things, never assume you can get along with zero knowledge of SQL and how your database will work. Find a good book about your DBMS of choice. Read up on it. Take the time to educate yourself—it will pay off handsomely in the long run.