Under the hood: ActiveRecord::Base.find, Part 2
When working with Rails, it is tempting to conceptualize ActiveRecord#find
into some genie that magically reaches into your database and pulls back the records matching the criteria you specified. The reality is a bit more complex, and sadly lacking in genies.
To satisfy any request, #find
works in two stages. First, it has to convert your criteria into SQL. Then, it has to take the results returned from the database and instantiate them as ActiveRecord objects.
The process of converting criteria into SQL is surprisingly complex, even for the simple cases. Let’s walk through the code to get an idea of what happens for the most trivial of cases, User.find(1)
.
(I’m working off of edge rails here, which at the time of this writing was at revision 5585. Code snippets given in this article will inevitably diverge from the latest edge. Caveat lector!)
1 2 3 4 5 6 7 8 9 10 11 |
def find(*args) options = extract_options_from_args!(args) validate_find_options(options) set_readonly_option!(options) case args.first when :first then find_initial(options) when :all then find_every(options) else find_from_ids(args, options) end end |
For the case we’re investigating, User.find(1)
, args
will simply be [1]
. The extract_options_from_args
simply checks the last element of the array, and if it is a Hash, pops it off and returns it. For our example, we’ll just get an empty hash back.
Then, we validate the options, making sure no bogus keys were passed in, and finally, we call set_readonly_option!
, which does some scope-checking to make sure the :readonly
option is set correctly.
With all that out of the way, we check the first argument, which is neither :first
nor :all
, so we call find_from_ids
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def find_from_ids(ids, options) expects_array = ids.first.kind_of?(Array) return ids.first if expects_array && ids.first.empty? ids = ids.flatten.compact.uniq case ids.size when 0 raise RecordNotFound, "Couldn't find #{name} without an ID" when 1 result = find_one(ids.first, options) expects_array ? [ result ] : result else find_some(ids, options) end end |
The mumbo-jumbo at the top of that method just checks to see if an array of ids was passed to the method. If it was, we have to make sure and return an array, regardless of the number of rows returned. We, however, did not pass an array of ids, so expects_array
will be false
.
Then, we flatten the list of ids (to expand nested arrays), compact it (to remove nil
elements), and remove duplicate elements. For us, the resulting ids
array will contain a single element (the number 1), so we take the second when
clause, which calls find_one
.
1 2 3 4 5 6 7 8 9 10 |
def find_one(id, options) conditions = " AND (#{sanitize_sql(options[:conditions])})" if options[:conditions] options.update :conditions => "#{table_name}.#{primary_key} = #{quote_value(id,columns_hash[primary_key])}#{conditions}" if result = find_every(options).first result else raise RecordNotFound, "Couldn't find #{name} with ID=#{id}#{conditions}" end end |
Now we start seeing some actual work being done to generate SQL. In our case, though, the options
hash has no :conditions
keys, so all that happens (via the second line) is that we add a condition which searches based on the id we passed in (1, in our case).
Then, we further the cause of lengthy stack-traces by calling find_every
, and return the first record it returns. (This is safe, because finding by a unique primary key will always return only a single record, at most.)
1 2 3 4 5 6 7 8 9 |
def find_every(options) records = scoped?(:find, :include) || options[:include] ? find_with_associations(options) : find_by_sql(construct_finder_sql(options)) records.each { |record| record.readonly! } if options[:readonly] records end |
With the find_every
method, we’re getting closer to the database. It checks to see whether an :include
criterion has been specified, and branches accordingly. If you specify one or more associations to load eagerly, you’ll take the considerably more complex branch with find_with_associations
. Fortunately for us, we get to go the find_by_sql
route. First, though, we jump into construct_finder_sql
to build the actual SQL for our simple query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
def construct_finder_sql(options) scope = scope(:find) sql = "SELECT #{(scope && scope[:select]) || options[:select] || '*'} " sql << "FROM #{(scope && scope[:from]) || options[:from] || table_name} " add_joins!(sql, options, scope) add_conditions!(sql, options[:conditions], scope) sql << " GROUP BY #{options[:group]} " if options[:group] add_order!(sql, options[:order], scope) add_limit!(sql, options, scope) add_lock!(sql, options, scope) sql end |
In our simple case, there is no active scope, and only a single element in the options
hash: :conditions
. The only method here that we’ll call with any effect is add_conditions!
, which appends the necessary conditions to the sql
string:
1 2 3 4 5 6 7 8 9 |
def add_conditions!(sql, conditions, scope = :auto) scope = scope(:find) if :auto == scope segments = [] segments << sanitize_sql(scope[:conditions]) if scope && scope[:conditions] segments << sanitize_sql(conditions) unless conditions.nil? segments << type_condition unless descends_from_active_record? segments.compact! sql << "WHERE (#{segments.join(") AND (")}) " unless segments.empty? end |
Essentially, this will build an array of “segments” that need to be and’ed together to form the complete where clause of the SQL statement. It pulls conditions from the scope (if it exists), the conditions
parameter (which in our case is just a simple comparison on the id), and the type condition (for classes that aren’t descended directly from ActiveRecord::Base
).
With that done, we pop back up the stack with our complete SQL statement, and invoke find_by_sql
.
1 2 3 |
def find_by_sql(sql) connection.select_all(sanitize_sql(sql), "#{name} Load").collect! { |record| instantiate(record) } end |
Compared to the others, this one is ridiculously simple. Execute the sql, and then call instantiate
for every record in the list. (That’s the second phase of #find
, as you’ll recall: building ActiveRecord objects from the result of a query.) The resulting array is returned back up the stack, where it will eventually come out the top as a single ActiveRecord instance (or an exception, if no matching record was found).
There you have it: the whirlwind tour. We’ll be examining more complex scenarios in subsequent posts, eventually delving into the eager loading code and scoping. For now, though, you should have a basic understanding of some of the complexity that ActiveRecord#find
deals with on your behalf for every query you submit. It’s not magic, just a lot of code! And nary a genie in sight.
Reader Comments
Thanks for the detailed article Jamis.
Is there ever likely to be a implementation of find_by_sql that lazily instantiates ActiveRecord objects from the result set as they are accessed in the resulting array, allowing you to iterate through large result sets but only ever have one ActiveRecord object in memory at a time? Or does this limitation run right through to the database driver layer?
Cheers.
20 Nov 2006
Sean, the problem is that the way ActiveRecord accesses the database, it always returns the entire result set (as an array of hashes). You could mitigate some of the weight of that by duplicating what find_by_sql does: build the SQL and then pass it to the
select_all
method of the connection, and then call ActiveRecord::Base.instantiate directly for each record that you need.That could make an interesting plugin, for sure, and might even be something that could find its way into core, eventually.
20 Nov 2006
This is just an idea, but could it be reasonable to return an Array proxy which converts Hashes into ActiveRecords on first access? Something like:
class ARProxy def initialize(a_model_class, some_data) @model_class = a_model_class @data = some_data end def instantiate(index, obj=nil) obj ||= @data[index] return obj if obj.is_a?(ActiveRecord::Base) obj = @model_class.instantiate(obj) @data[index] = obj return obj end def [](index) _instantiate(index) end def each @data.each_with_index do |obj, index| yield(instantiate(index, obj)) end # other Array methods which need to be reimplemented end
20 Nov 2006
ste, one of the things we want to avoid with this is storing a reference to every object in the result. The problem is when you try and use ActiveRecord on datasets with tens or hundreds of thousands of rows, you rapidly run out of memory because of all the AR instances. Other than that, though, I think your idea is interesting, and I would encourage you to create a plugin to test it “in the wild”.
20 Nov 2006
Jamis, can you clean up his ste’s Ruby? Making me all cross eyed :)
20 Nov 2006
Thanks for a wonderful explanation Jamis. I especially appreciate the little explanations you throw in (ids.flatten.compact.uniq).
After reading this excellent article, I was wondering, would it be worthwhile to have the special case of finding a given id by just short-circuiting the whole general case process and sending the sql for id=nnn straight to the backend?
I think it would be so much better than having to use the general case machinery which is great for :all and maybe :first but a tad bit overkill for the situation which we know will always generate the same sql (select for X where id = Y)
Not sure what the performance gains would be, but just thought i’d ask.
Thanks again for a wonderful and accessible article. Your time and effort is greatly appreciated! (and yes, I am going to make a paypal contribution, because I think this kind of effort needs to be supported. I would urge other readers to do the same so we keep getting the next installments as this is surely a time consuming process and an excellent documentation effort which needs to be supported beyond mere words of support)
23 Nov 2006
Thank-you, Amr! I’m glad you’re liking the articles.
I would encourage you to try the short-circuit you suggested and run some benchmarks to see how much that helps. You’re right, that find-by-id scenario is pretty common.
23 Nov 2006