The maze book for programmers!
mazesforprogrammers.com

Algorithms, circle mazes, hex grids, masking, weaving, braiding, 3D and 4D grids, spheres, and more!

DRM-Free Ebook

The Buckblog

assorted ramblings by Jamis Buck

Under the hood: ActiveRecord::Base.find, Part 2

20 November 2006 — 5-minute read

When working with Rails, it is tempting to conceptualize ActiveRecord#find into some genie that magically reaches into your database and pulls back the records matching the criteria you specified. The reality is a bit more complex, and sadly lacking in genies.

To satisfy any request, #find works in two stages. First, it has to convert your criteria into SQL. Then, it has to take the results returned from the database and instantiate them as ActiveRecord objects.

The process of converting criteria into SQL is surprisingly complex, even for the simple cases. Let’s walk through the code to get an idea of what happens for the most trivial of cases, User.find(1).

(I’m working off of edge rails here, which at the time of this writing was at revision 5585. Code snippets given in this article will inevitably diverge from the latest edge. Caveat lector!)

1
2
3
4
5
6
7
8
9
10
11
def find(*args)
  options = extract_options_from_args!(args)
  validate_find_options(options)
  set_readonly_option!(options)

  case args.first
    when :first then find_initial(options)
    when :all   then find_every(options)
    else             find_from_ids(args, options)
  end
end

For the case we’re investigating, User.find(1), args will simply be [1]. The extract_options_from_args simply checks the last element of the array, and if it is a Hash, pops it off and returns it. For our example, we’ll just get an empty hash back.

Then, we validate the options, making sure no bogus keys were passed in, and finally, we call set_readonly_option!, which does some scope-checking to make sure the :readonly option is set correctly.

With all that out of the way, we check the first argument, which is neither :first nor :all, so we call find_from_ids.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def find_from_ids(ids, options)
  expects_array = ids.first.kind_of?(Array)
  return ids.first if expects_array && ids.first.empty?

  ids = ids.flatten.compact.uniq

  case ids.size
    when 0
      raise RecordNotFound, "Couldn't find #{name} without an ID"
    when 1
      result = find_one(ids.first, options)
      expects_array ? [ result ] : result
    else
      find_some(ids, options)
  end
end

The mumbo-jumbo at the top of that method just checks to see if an array of ids was passed to the method. If it was, we have to make sure and return an array, regardless of the number of rows returned. We, however, did not pass an array of ids, so expects_array will be false.

Then, we flatten the list of ids (to expand nested arrays), compact it (to remove nil elements), and remove duplicate elements. For us, the resulting ids array will contain a single element (the number 1), so we take the second when clause, which calls find_one.

1
2
3
4
5
6
7
8
9
10
def find_one(id, options)
  conditions = " AND (#{sanitize_sql(options[:conditions])})" if options[:conditions]
  options.update :conditions => "#{table_name}.#{primary_key} = #{quote_value(id,columns_hash[primary_key])}#{conditions}"

  if result = find_every(options).first
    result
  else
    raise RecordNotFound, "Couldn't find #{name} with ID=#{id}#{conditions}"
  end
end

Now we start seeing some actual work being done to generate SQL. In our case, though, the options hash has no :conditions keys, so all that happens (via the second line) is that we add a condition which searches based on the id we passed in (1, in our case).

Then, we further the cause of lengthy stack-traces by calling find_every, and return the first record it returns. (This is safe, because finding by a unique primary key will always return only a single record, at most.)

1
2
3
4
5
6
7
8
9
def find_every(options)
  records = scoped?(:find, :include) || options[:include] ?
    find_with_associations(options) : 
    find_by_sql(construct_finder_sql(options))

  records.each { |record| record.readonly! } if options[:readonly]

  records
end

With the find_every method, we’re getting closer to the database. It checks to see whether an :include criterion has been specified, and branches accordingly. If you specify one or more associations to load eagerly, you’ll take the considerably more complex branch with find_with_associations. Fortunately for us, we get to go the find_by_sql route. First, though, we jump into construct_finder_sql to build the actual SQL for our simple query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def construct_finder_sql(options)
  scope = scope(:find)
  sql  = "SELECT #{(scope && scope[:select]) || options[:select] || '*'} "
  sql << "FROM #{(scope && scope[:from]) || options[:from] || table_name} "

  add_joins!(sql, options, scope)
  add_conditions!(sql, options[:conditions], scope)

  sql << " GROUP BY #{options[:group]} " if options[:group]

  add_order!(sql, options[:order], scope)
  add_limit!(sql, options, scope)
  add_lock!(sql, options, scope)

  sql
end

In our simple case, there is no active scope, and only a single element in the options hash: :conditions. The only method here that we’ll call with any effect is add_conditions!, which appends the necessary conditions to the sql string:

1
2
3
4
5
6
7
8
9
def add_conditions!(sql, conditions, scope = :auto)
  scope = scope(:find) if :auto == scope
  segments = []
  segments << sanitize_sql(scope[:conditions]) if scope && scope[:conditions]
  segments << sanitize_sql(conditions) unless conditions.nil?
  segments << type_condition unless descends_from_active_record?        
  segments.compact!
  sql << "WHERE (#{segments.join(") AND (")}) " unless segments.empty?
end

Essentially, this will build an array of “segments” that need to be and’ed together to form the complete where clause of the SQL statement. It pulls conditions from the scope (if it exists), the conditions parameter (which in our case is just a simple comparison on the id), and the type condition (for classes that aren’t descended directly from ActiveRecord::Base).

With that done, we pop back up the stack with our complete SQL statement, and invoke find_by_sql.

1
2
3
def find_by_sql(sql)
  connection.select_all(sanitize_sql(sql), "#{name} Load").collect! { |record| instantiate(record) }
end

Compared to the others, this one is ridiculously simple. Execute the sql, and then call instantiate for every record in the list. (That’s the second phase of #find, as you’ll recall: building ActiveRecord objects from the result of a query.) The resulting array is returned back up the stack, where it will eventually come out the top as a single ActiveRecord instance (or an exception, if no matching record was found).

There you have it: the whirlwind tour. We’ll be examining more complex scenarios in subsequent posts, eventually delving into the eager loading code and scoping. For now, though, you should have a basic understanding of some of the complexity that ActiveRecord#find deals with on your behalf for every query you submit. It’s not magic, just a lot of code! And nary a genie in sight.

Reader Comments

Thanks for the detailed article Jamis.

Is there ever likely to be a implementation of find_by_sql that lazily instantiates ActiveRecord objects from the result set as they are accessed in the resulting array, allowing you to iterate through large result sets but only ever have one ActiveRecord object in memory at a time? Or does this limitation run right through to the database driver layer?

Cheers.

Sean, the problem is that the way ActiveRecord accesses the database, it always returns the entire result set (as an array of hashes). You could mitigate some of the weight of that by duplicating what find_by_sql does: build the SQL and then pass it to the select_all method of the connection, and then call ActiveRecord::Base.instantiate directly for each record that you need.

That could make an interesting plugin, for sure, and might even be something that could find its way into core, eventually.

This is just an idea, but could it be reasonable to return an Array proxy which converts Hashes into ActiveRecords on first access? Something like:

class ARProxy def initialize(a_model_class, some_data) @model_class = a_model_class @data = some_data end def instantiate(index, obj=nil) obj ||= @data[index] return obj if obj.is_a?(ActiveRecord::Base) obj = @model_class.instantiate(obj) @data[index] = obj return obj end def [](index) _instantiate(index) end def each @data.each_with_index do |obj, index| yield(instantiate(index, obj)) end # other Array methods which need to be reimplemented end

ste, one of the things we want to avoid with this is storing a reference to every object in the result. The problem is when you try and use ActiveRecord on datasets with tens or hundreds of thousands of rows, you rapidly run out of memory because of all the AR instances. Other than that, though, I think your idea is interesting, and I would encourage you to create a plugin to test it “in the wild”.

Jamis, can you clean up his ste’s Ruby? Making me all cross eyed :)

Thanks for a wonderful explanation Jamis. I especially appreciate the little explanations you throw in (ids.flatten.compact.uniq).

After reading this excellent article, I was wondering, would it be worthwhile to have the special case of finding a given id by just short-circuiting the whole general case process and sending the sql for id=nnn straight to the backend?

I think it would be so much better than having to use the general case machinery which is great for :all and maybe :first but a tad bit overkill for the situation which we know will always generate the same sql (select for X where id = Y)

Not sure what the performance gains would be, but just thought i’d ask.

Thanks again for a wonderful and accessible article. Your time and effort is greatly appreciated! (and yes, I am going to make a paypal contribution, because I think this kind of effort needs to be supported. I would urge other readers to do the same so we keep getting the next installments as this is surely a time consuming process and an excellent documentation effort which needs to be supported beyond mere words of support)

Thank-you, Amr! I’m glad you’re liking the articles.

I would encourage you to try the short-circuit you suggested and run some benchmarks to see how much that helps. You’re right, that find-by-id scenario is pretty common.