Buckblog: Under the hood: route recognition in Rails

Under the hood: route recognition in Rails

4 October 2006 — 8-minute read

Monday’s article presented the implementation of Rails’ routing DSL. (If you haven’t read it yet, you ought to—this article assumes you’re familiar with at least as much of the routing code as that prior article explained.)

Like any good code, the implementation of routing will change over time, as bugs are fixed, features are added, and new needs are discovered. This article describes the implementation as of revision 5169.

The DSL implementation of routes only scratches the surface. In this second installment, we’re going to delve even deeper. We’re going to lay bare the mysteries of route recognition.

Route recognition is one of the very first tasks that a Rails application executes upon receiving a request. What it does is (conceptually) very simple: given a URI path, determine what controller and action should process the request, as well as what additional parameters should be passed in. In practice, however, there’s a lot of complexity hidden there.

The journey begins in railties/lib/dispatcher.rb, in Dispatcher.dispatch. First, the request and response objects are created, the application is “prepared” (with actions that vary depending on whether you are running in production mode or not), and then routing is asked to recognize the current path.

request, response = ActionController::CgiRequest.new(cgi, session_options),
  ActionController::CgiResponse.new(cgi)
prepare_application
controller = ActionController::Routing::Routes.recognize(request)

With that innocent command, we leap into the routing code. Feel free to follow along, beginning on line 1243 of routing.rb:

def recognize(request)
  params = recognize_path(request.path, extract_request_environment(request))
  request.path_parameters = params.with_indifferent_access
  "#{params[:controller].camelize}Controller".constantize
end

That first line of the recognize method first extracts an “environment” hash from the request, and then invokes recognize_path with the path from the request, and the environment hash. This environment hash currently consists of only the request method, but if you are writing a routing extension that needs other information from the request (like the host name, or whether HTTPS is enabled) you can extend the RouteSet#extract_request_environment method to pull the additional data out. You’ll see (later) where that information is used in the recognition process.

The RouteSet#recognize_path method simply iterates over all defined routes, asking each whether or not it can recognize the given path. As soon as one responds in the affirmative, the loop stops and the result is returned. If no route matches the given parameters, a RoutingError is raised.

def recognize_path(path, environment={})
  path = CGI.unescape(path)
  routes.each do |route|
    result = route.recognize(path, environment) and return result
  end
  raise RoutingError, "no route found to match #{path.inspect} with #{environment.inspect}"
end

Here, then, is where things begin to get interesting. Go ahead and jump to Route#recognize, on line 464:

1234	def recognize(path, environment={}) write_recognition recognize path, environmentend

“But wait!” you say. “There’s nothing there but a recursive call to Route#recognize!”

“Ah,” I reply, “but note the call to write_recognition...”

Thus we introduce one of the reasons the routing code is hard to grasp. It rewrites itself on demand, for optimization reasons. Basically, the first time a route is asked to recognize a path, it will take all of its component segments, and all of their requirements, and dynamically generate a new recognize method based on them. Subsequent calls to that route’s recognize method will use the dynamically generated version. This allows route recognition to be quite fast, even with many routes defined.

That’s not much comfort, however, to the stalwart spelunker who wishes to understand how it all works.

Let’s try to demystify this by looking first at what a few dynamically generated recognize methods look like. From there, we can better understand the steps which the routing code takes to actually build that code.

Specifically, let’s consider the following three routes:

ActionController::Routing::Routes.draw do |map|
  map.connect "/", :controller => "foo", :action => "index"

  map.connect "/foo/:action", :controller => "foo"

  map.connect "/foo/:view/:permalink", :controller => "foo",
    :action => "show", :view => /plain|fancy/,
    :permalink => /[-a-z0-9]+/,
    :conditions => { :method => :get }
end

If you could see the code that gets generated for that first route, you’d see that it’s new recognize method would look more or less like this:

def recognize(path, env={})
  if (match = /\A\/?\Z/.match(path))
    params = parameter_shell.dup
    params
  end
end

In other words, match the path against the given regex (testing only to see if the string is empty, or a forward slash) and if it succeeds, return the route’s parameter_shell. (The parameter shell is the list of all non-regex requirements for a given route; in this case, it will be :controller => "foo", :action => "index", because those are the options that were given in the route’s definition.)

That’s the simplest case. Moving to the next route, we can see how dynamic segments like :action get handled:

def recognize(path, env={})
  if (match = /\A\/foo(?:\/?\Z|\/([^\/;.,?]+)\/?)\Z/.match(path))
    params = parameter_shell.dup
    params[:action] = match[1] || "index"
    params
  end
end

Again, the first thing that happens is the path is matched against a regex. The regex simply makes sure the path begins with ”/foo”, and is followed by an optional group that contains anything except path delimiters. (In this case, the group is optional, because the :action key is always defaulted to “index”. Other keys, as you’ll see, are not necessarily optional.)

If the regex matches, we dup the parameter shell, and then set the :action parameter to either the first match, or “index”. Then, the parameters are returned.

Pretty straightforward! Let’s move on to the third and final example, which looks like it might be a lot more complex. We’ve got two keys in the path (:view and :permalink), both of which have regex that restrict the set of values they can match. We also require that the route only match if the request method is GET. Behold:

def recognize(path, env={})
  if (match = /\A\/foo\/(plain|fancy)\/([-a-z0-9]+)\/?\Z/.match(path)) && conditions[:method] === env[:method]
    params = parameter_shell.dup
    params[:view] = match[1] if match[1]
    params[:permalink] = match[2] if match[2]
    params
  end
end

It just doesn’t get much simpler than that, folks. We match the path against the regex, and we compare the request method that the route requires (in the conditions hash) against the request method that was actually used (in the env hash). If all is good, we populate the params with the :view and :permalink values that were extracted from the path, and return it.

Boom! (As Steve Jobs would say.)

So, now we have some idea of the code that we want to generate. The rest of this article will show how it is actually built.

First, take a look at the Route@write_recognition method on line 370.

def write_recognition
  body = "params = parameter_shell.dup\n#{recognition_extraction * "\n"}\nparams"
  body = "if #{recognition_conditions.join(" && ")}\n#{body}\nend"

  method_decl = "def recognize(path, env={})\n#{body}\nend"

  instance_eval method_decl, "generated code (#{__FILE__}:#{__LINE__})"
  method_decl
end

All it does is build up a string that contains the method definition, and then sends it to instance_eval to actually install the new method. It also returns the string, so you can debug your routes easily by doing something like:

ActionController::Routing::Routes.routes.each do |route|
  puts route
  puts route.write_recognition
  puts
end

Go ahead and try that—it’s quite educational!

The write_recognition method builds the method in three parts:

the “body” (what gets executed when the regex matches) via recognition_extraction.
the “conditions” (the regex and any other special conditions) via recognition_conditions.
the “method declaration” (the method name and parameters)

Let’s look at how the body gets built first. Go ahead and jump to line 401, Route#recognition_extraction.

def recognition_extraction
  next_capture = 1
  extraction = segments.collect do |segment|
    x = segment.match_extraction next_capture
    next_capture += Regexp.new(segment.regexp_chunk).number_of_captures
    x
  end
  extraction.compact
end

What this does is loop over all the segments that compose the route. Each segment is asked for a string containing Ruby code that will extract the necessary information for that segment. These snippets of code are then collected into an array, and nil entries eliminated (via Array#compact).

I hate to do this to you, gentle reader, but let’s skip down one more level in the call stack and look at one of the match_extraction implementations. The default Segment#match_extraction method just returns nil—by default a segment encapsulates no parameter data. However, segments like DynamicSegment and ControllerSegment contain information that needs to be extracted. Let’s just look at DynamicSegment#match_extraction (on line 716):

def match_extraction(next_capture)
  hangon = (default ? "|| #{default.inspect}" : "if match[#{next_capture}]")
  
  # All non code-related keys (such as :id, :slug) have to be unescaped as other CGI params
  "params[:#{key}] = match[#{next_capture}] #{hangon}"
end

Here, “hangon” is just a cute variable name for a snippet of code that trails the match assignment (like a default value, or a conditional capture). Note also the next_capture parameter; this is used to keep track of the which capture (or captures) to extract from the match parameter.

Though I won’t go into them here, the match_extraction methods for both ControllerSegment and PathSegment are similar.

One last thing to point out in recognition_extraction: the call to Regexp#number_of_captures. This method is defined near the top of the routing.rb file, and it simply returns the number of capture groups within the regular expression. This is used to determine which capture indexes to allocate to each segment (in match_extraction), since a segment cannot pull data from capture groups it did not define.

Alright, following this so far? We’re almost done. Let’s next look at how the regex itself is constructed, and how conditions like the request method comparison are built.

def recognition_conditions
  result = ["(match = #{Regexp.new(recognition_pattern).inspect}.match(path))"]
  result << "conditions[:method] === env[:method]" if conditions[:method]
  result
end

def recognition_pattern(wrap = true)
  pattern = ''
  segments.reverse_each do |segment|
    pattern = segment.build_pattern pattern
  end
  wrap ? ("\\A" + pattern + "\\Z") : pattern
end

What this does is first construct a regular expression to compare against the path. This is done by aggregating the patterns of each segment into a single regular expression (via the recognition_pattern method), and then appending the request method comparison (if relevant for this route). For those of you wanting to extend routing with your own custom conditions (like routes based on hostname and such), this is where you would add those conditions, based on the conditions hash.

So! We’ve now generated the code for the route. Hiking back up the call stack, we find ourselves back in write_recognition, which evaluates the string and installs the new method into the route. Hiking up another level, we wind up back in the original Route#recognize method, where we make what appears to be a recursive call to Route#recognize. However, this will actually invoke the new method definition, recently installed by write_recognition, which will execute the newly generated code.

And there you have it, ladies and gents, the route recognition code. It’s really not so much of a much, is it? Once you wrap your mind around run-time generation of code, it all flows together pretty well. There are some edge cases and such that I didn’t cover, but you’re encouraged to explore those on your own. “An exercise for the reader,” and all that. Especially, try investigating what a route looks like that has optional values (:permalink => nil), or which uses path segments. See what the recognition code for such routes consists of.

By this point, you should have some grasp of about two-thirds of the routing code. The remaining third, route generation, will be covered in the next article, but be warned: it’ll be the hairiest of the three!

Reader Comments

What's with the excessive use of #match? Why not pull the matchdata out of the global only when you need it to avoid the overhead of constructing it when you don't.

Eric Hodel
4 Oct 2006

Thanks for the suggestion, Eric. It'd be interesting to see a comparison demonstrating the difference in performance. Route recognition is pretty snappy currently--we'd love for it to be snappier still. If you have some time, I'd really appreciate it if you could look into that. Route recognition is rarely the bottleneck--route generation (which will be the next topic) is where things really tend to get clogged up. Routes are recognized only once per request, but you may need to generate lots of URL's per page. Thus, we haven't spent a lot of time optimizing the recognition process, yet.

Jamis
4 Oct 2006

Jamis, these articles are beyond amazing. I never find the time to go digging myself, but I do find the time to read these articles. Long may you run.

I hope you realize you have a real talent for this kind of exposition. In my experience it’s very rare to find someone who can write at just the right level of detail to explain code succinctly and clearly. People often get caught up in the beauty of their abstractions, or the intricate details of the implementation.

I’m off to throw some money in the paypal pot now, and I hope everyone else who reads this does, ‘cause I’d selfishly like to see these articles continue for a long time :)

Cheers.

Grant McInnes
5 Oct 2006

Grant, many thanks for the kind words. I’m very happy that people are enjoying these articles!

Jamis
5 Oct 2006

Jamis, this series has rocked. I recently started trimming my list of blogs that I read because I was glossing over good articles such as these because the list of crap was so long and I felt I had to read it all. I’m very glad I did. Keep up the good work you’re doing an excellent service to the community.

Lee Jensen
5 Oct 2006

This is really cool (and amazing as to how much complexity is there in just one piece of Rails!) you core people are truly amazing!

I sure hope you would please continue with these, and if you ever decide to (self?) publish these, I will be sure to purchase a copy of your Under the Hood book.

Many Thanks indeed!

Amr Malik
12 Oct 2006

jamis I’m pretty new to rails and have been writing a way of dynamically generating objects at run-time and this blog was the last piece I was missing ( doing source generation ). Thanks for making it easy for someone like me to follow. Keep the under the hood stuff coming. Rails is one of those things that at first glance looks anti-climatic because its so easy, but when you look at the artful way the underlying code is written it really makes you sit up and take notice.

-d

dan
1 Nov 2006

The Buckblog

assorted ramblings by Jamis Buck

Under the hood: route recognition in Rails

Reader Comments