Under the hood: route recognition in Rails
Monday’s article presented the implementation of Rails’ routing DSL. (If you haven’t read it yet, you ought to—this article assumes you’re familiar with at least as much of the routing code as that prior article explained.)
The DSL implementation of routes only scratches the surface. In this second installment, we’re going to delve even deeper. We’re going to lay bare the mysteries of route recognition.
Route recognition is one of the very first tasks that a Rails application executes upon receiving a request. What it does is (conceptually) very simple: given a URI path, determine what controller and action should process the request, as well as what additional parameters should be passed in. In practice, however, there’s a lot of complexity hidden there.
The journey begins in railties/lib/dispatcher.rb, in Dispatcher.dispatch
. First, the request and response objects are created, the application is “prepared” (with actions that vary depending on whether you are running in production mode or not), and then routing is asked to recognize the current path.
1 2 3 4 |
request, response = ActionController::CgiRequest.new(cgi, session_options), ActionController::CgiResponse.new(cgi) prepare_application controller = ActionController::Routing::Routes.recognize(request) |
With that innocent command, we leap into the routing code. Feel free to follow along, beginning on line 1243 of routing.rb
:
1 2 3 4 5 |
def recognize(request) params = recognize_path(request.path, extract_request_environment(request)) request.path_parameters = params.with_indifferent_access "#{params[:controller].camelize}Controller".constantize end |
That first line of the recognize
method first extracts an “environment” hash from the request, and then invokes recognize_path
with the path from the request, and the environment hash. This environment hash currently consists of only the request method, but if you are writing a routing extension that needs other information from the request (like the host name, or whether HTTPS is enabled) you can extend the RouteSet#extract_request_environment
method to pull the additional data out. You’ll see (later) where that information is used in the recognition process.
The RouteSet#recognize_path
method simply iterates over all defined routes, asking each whether or not it can recognize the given path. As soon as one responds in the affirmative, the loop stops and the result is returned. If no route matches the given parameters, a RoutingError
is raised.
1 2 3 4 5 6 7 |
def recognize_path(path, environment={}) path = CGI.unescape(path) routes.each do |route| result = route.recognize(path, environment) and return result end raise RoutingError, "no route found to match #{path.inspect} with #{environment.inspect}" end |
Here, then, is where things begin to get interesting. Go ahead and jump to Route#recognize
, on line 464:
1 2 3 4 |
def recognize(path, environment={}) write_recognition recognize path, environment end |
“But wait!” you say. “There’s nothing there but a recursive call to Route#recognize
!”
“Ah,” I reply, “but note the call to write_recognition
...”
Thus we introduce one of the reasons the routing code is hard to grasp. It rewrites itself on demand, for optimization reasons. Basically, the first time a route is asked to recognize a path, it will take all of its component segments, and all of their requirements, and dynamically generate a new recognize
method based on them. Subsequent calls to that route’s recognize
method will use the dynamically generated version. This allows route recognition to be quite fast, even with many routes defined.
That’s not much comfort, however, to the stalwart spelunker who wishes to understand how it all works.
Let’s try to demystify this by looking first at what a few dynamically generated recognize
methods look like. From there, we can better understand the steps which the routing code takes to actually build that code.
Specifically, let’s consider the following three routes:
1 2 3 4 5 6 7 8 9 10 |
ActionController::Routing::Routes.draw do |map| map.connect "/", :controller => "foo", :action => "index" map.connect "/foo/:action", :controller => "foo" map.connect "/foo/:view/:permalink", :controller => "foo", :action => "show", :view => /plain|fancy/, :permalink => /[-a-z0-9]+/, :conditions => { :method => :get } end |
If you could see the code that gets generated for that first route, you’d see that it’s new recognize
method would look more or less like this:
1 2 3 4 5 6 |
def recognize(path, env={}) if (match = /\A\/?\Z/.match(path)) params = parameter_shell.dup params end end |
In other words, match the path against the given regex (testing only to see if the string is empty, or a forward slash) and if it succeeds, return the route’s parameter_shell
. (The parameter shell is the list of all non-regex requirements for a given route; in this case, it will be :controller => "foo"
, :action => "index"
, because those are the options that were given in the route’s definition.)
That’s the simplest case. Moving to the next route, we can see how dynamic segments like :action
get handled:
1 2 3 4 5 6 7 |
def recognize(path, env={}) if (match = /\A\/foo(?:\/?\Z|\/([^\/;.,?]+)\/?)\Z/.match(path)) params = parameter_shell.dup params[:action] = match[1] || "index" params end end |
Again, the first thing that happens is the path is matched against a regex. The regex simply makes sure the path begins with ”/foo”, and is followed by an optional group that contains anything except path delimiters. (In this case, the group is optional, because the :action
key is always defaulted to “index”. Other keys, as you’ll see, are not necessarily optional.)
If the regex matches, we dup the parameter shell, and then set the :action
parameter to either the first match, or “index”. Then, the parameters are returned.
Pretty straightforward! Let’s move on to the third and final example, which looks like it might be a lot more complex. We’ve got two keys in the path (:view
and :permalink
), both of which have regex that restrict the set of values they can match. We also require that the route only match if the request method is GET. Behold:
1 2 3 4 5 6 7 8 |
def recognize(path, env={}) if (match = /\A\/foo\/(plain|fancy)\/([-a-z0-9]+)\/?\Z/.match(path)) && conditions[:method] === env[:method] params = parameter_shell.dup params[:view] = match[1] if match[1] params[:permalink] = match[2] if match[2] params end end |
It just doesn’t get much simpler than that, folks. We match the path against the regex, and we compare the request method that the route requires (in the conditions
hash) against the request method that was actually used (in the env
hash). If all is good, we populate the params with the :view
and :permalink
values that were extracted from the path, and return it.
Boom! (As Steve Jobs would say.)
So, now we have some idea of the code that we want to generate. The rest of this article will show how it is actually built.
First, take a look at the Route@write_recognition
method on line 370.
1 2 3 4 5 6 7 8 9 |
def write_recognition body = "params = parameter_shell.dup\n#{recognition_extraction * "\n"}\nparams" body = "if #{recognition_conditions.join(" && ")}\n#{body}\nend" method_decl = "def recognize(path, env={})\n#{body}\nend" instance_eval method_decl, "generated code (#{__FILE__}:#{__LINE__})" method_decl end |
All it does is build up a string that contains the method definition, and then sends it to instance_eval
to actually install the new method. It also returns the string, so you can debug your routes easily by doing something like:
1 2 3 4 5 |
ActionController::Routing::Routes.routes.each do |route| puts route puts route.write_recognition puts end |
Go ahead and try that—it’s quite educational!
The write_recognition
method builds the method in three parts:
- the “body” (what gets executed when the regex matches) via
recognition_extraction
. - the “conditions” (the regex and any other special conditions) via
recognition_conditions
. - the “method declaration” (the method name and parameters)
Let’s look at how the body gets built first. Go ahead and jump to line 401, Route#recognition_extraction
.
1 2 3 4 5 6 7 8 9 |
def recognition_extraction next_capture = 1 extraction = segments.collect do |segment| x = segment.match_extraction next_capture next_capture += Regexp.new(segment.regexp_chunk).number_of_captures x end extraction.compact end |
What this does is loop over all the segments that compose the route. Each segment is asked for a string containing Ruby code that will extract the necessary information for that segment. These snippets of code are then collected into an array, and nil
entries eliminated (via Array#compact
).
I hate to do this to you, gentle reader, but let’s skip down one more level in the call stack and look at one of the match_extraction
implementations. The default Segment#match_extraction
method just returns nil
—by default a segment encapsulates no parameter data. However, segments like DynamicSegment
and ControllerSegment
contain information that needs to be extracted. Let’s just look at DynamicSegment#match_extraction
(on line 716):
1 2 3 4 5 6 |
def match_extraction(next_capture) hangon = (default ? "|| #{default.inspect}" : "if match[#{next_capture}]") # All non code-related keys (such as :id, :slug) have to be unescaped as other CGI params "params[:#{key}] = match[#{next_capture}] #{hangon}" end |
Here, “hangon” is just a cute variable name for a snippet of code that trails the match assignment (like a default value, or a conditional capture). Note also the next_capture
parameter; this is used to keep track of the which capture (or captures) to extract from the match
parameter.
Though I won’t go into them here, the match_extraction
methods for both ControllerSegment
and PathSegment
are similar.
One last thing to point out in recognition_extraction
: the call to Regexp#number_of_captures
. This method is defined near the top of the routing.rb
file, and it simply returns the number of capture groups within the regular expression. This is used to determine which capture indexes to allocate to each segment (in match_extraction
), since a segment cannot pull data from capture groups it did not define.
Alright, following this so far? We’re almost done. Let’s next look at how the regex itself is constructed, and how conditions like the request method comparison are built.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def recognition_conditions result = ["(match = #{Regexp.new(recognition_pattern).inspect}.match(path))"] result << "conditions[:method] === env[:method]" if conditions[:method] result end def recognition_pattern(wrap = true) pattern = '' segments.reverse_each do |segment| pattern = segment.build_pattern pattern end wrap ? ("\\A" + pattern + "\\Z") : pattern end |
What this does is first construct a regular expression to compare against the path. This is done by aggregating the patterns of each segment into a single regular expression (via the recognition_pattern
method), and then appending the request method comparison (if relevant for this route). For those of you wanting to extend routing with your own custom conditions (like routes based on hostname and such), this is where you would add those conditions, based on the conditions
hash.
So! We’ve now generated the code for the route. Hiking back up the call stack, we find ourselves back in write_recognition
, which evaluates the string and installs the new method into the route. Hiking up another level, we wind up back in the original Route#recognize
method, where we make what appears to be a recursive call to Route#recognize
. However, this will actually invoke the new method definition, recently installed by write_recognition
, which will execute the newly generated code.
And there you have it, ladies and gents, the route recognition code. It’s really not so much of a much, is it? Once you wrap your mind around run-time generation of code, it all flows together pretty well. There are some edge cases and such that I didn’t cover, but you’re encouraged to explore those on your own. “An exercise for the reader,” and all that. Especially, try investigating what a route looks like that has optional values (:permalink => nil
), or which uses path segments. See what the recognition code for such routes consists of.
By this point, you should have some grasp of about two-thirds of the routing code. The remaining third, route generation, will be covered in the next article, but be warned: it’ll be the hairiest of the three!
Reader Comments
4 Oct 2006
4 Oct 2006
Jamis, these articles are beyond amazing. I never find the time to go digging myself, but I do find the time to read these articles. Long may you run.
I hope you realize you have a real talent for this kind of exposition. In my experience it’s very rare to find someone who can write at just the right level of detail to explain code succinctly and clearly. People often get caught up in the beauty of their abstractions, or the intricate details of the implementation.
I’m off to throw some money in the paypal pot now, and I hope everyone else who reads this does, ‘cause I’d selfishly like to see these articles continue for a long time :)
Cheers.
5 Oct 2006
Grant, many thanks for the kind words. I’m very happy that people are enjoying these articles!
5 Oct 2006
Jamis, this series has rocked. I recently started trimming my list of blogs that I read because I was glossing over good articles such as these because the list of crap was so long and I felt I had to read it all. I’m very glad I did. Keep up the good work you’re doing an excellent service to the community.
5 Oct 2006
This is really cool (and amazing as to how much complexity is there in just one piece of Rails!) you core people are truly amazing!
I sure hope you would please continue with these, and if you ever decide to (self?) publish these, I will be sure to purchase a copy of your Under the Hood book.
Many Thanks indeed!
12 Oct 2006
jamis I’m pretty new to rails and have been writing a way of dynamically generating objects at run-time and this blog was the last piece I was missing ( doing source generation ). Thanks for making it easy for someone like me to follow. Keep the under the hood stuff coming. Rails is one of those things that at first glance looks anti-climatic because its so easy, but when you look at the artful way the underlying code is written it really makes you sit up and take notice.
-d
1 Nov 2006