The maze book for programmers!

Algorithms, circle mazes, hex grids, masking, weaving, braiding, 3D and 4D grids, spheres, and more!

DRM-Free Ebook

The Buckblog

assorted ramblings by Jamis Buck


4 January 2007 — 2-minute read

Yeah, I’m on a real testing kick these days. Today’s TADFALAICKIU is a little trick you can play with assert_select.

Out of the box, assert_select works only with HTML documents. If you try it on an XML document (like, say, an RSS or Atom feed), you’ll more than likely see warnings about the document being malformed. It’s easy enough to work around, though.

The trick is to define your own HTML::Document instance before calling assert_select. When you define your own instance, you can pass some optional parameters that make HTML::Document play nicely with XML:


def xml_document
  @xml_document ||=, false, true)

The second parameter says whether you want to parse the document in strict mode or not. “False” is the default; if you set it to “true”, you’ll get an exception (instead of a warning) whenever the parser hits what looks like a malformed document.

The third parameter says whether or not the document is an XML document or not. It defaults to “false”, meaning that, by default, only HTML documents are parsed. Here, we set it to “true”.

Once you’ve defined your custom document, you just need to define your assert_select wrapper:


def assert_xml_select(*args, &block)
  @html_document = xml_document
  assert_select(*args, &block)

The assert_select method then will reuse the existing html_document value. Nothing to it! Just put these two methods in your test/test_helper.rb file, and you’re all set.

(Caveat: the above won’t work if you’re trying to use assert_select on XML returned via RJS, since when assert_select parses the RJS response to extract the document, it builds it’s own html document.)

Update: Jerry Vos and Jeff Talbot pointed out the the version of assert_xml_select I originally had was not passing the block through, so nested assert_xml_select calls were not being invoked. I’ve updated the code, above, to pass the block through. Thanks for catching that, guys!

Reader Comments

Hmm.. I’ve had no warnings using assert_select with XML in 1.2rc1. I’m not using resources or to_xml; just Builder templates.

Thoughts on overriding the html_document method rather than assigning to an instance variable?

def xml_document @xml_document ||=, false, true) end alias :html_document :xml_document

John, you may be fine. It depends on the tags that are in your documents, since HTML::Document does some special casing with tags when in HTML mode. If you aren’t feeling any pain, feel free to ignore this tip.

Dave, you don’t want to do that, since assert_select would then break on HTML documents. HTML (and even XHTML) is not a true subset of XML, so you can’t really use the same rules on both.

Thanks for the tip Jamis! I too have been using assert_select on XML documents, VoiceXML in my case, without problems. I will definetely keep this tip in mind though, seems like a more robust approach.

Hi Jamis—this new series is great, please keep it up. :)

assert_select will work with some XML documents, but fail the moment you use an HTML tag that’s self closing, e.g. link and meta. So as a general rule I recommend to not use it with XML.

There was an assert_select_feed in the original contribution, for dealing specifically with RSS and Atom, both of which break when parsed as HTML.

Maybe we should add assert_select_xml to core … the code is simple to recreate, but people are likely to fail not realizing the difference between the XML and HTML parsing. That way, we can save them the time and energy.

Jamis said: [XHTML] is not a true subset of XML

Interesting. Can you say why, or point me at an explanation?

David, sure. In XML, you can always use the <foo /> syntax for closing empty tags. However, if you try and do that in XHTML with a tag that requires an explicit closing tag, you’ll find your page doesn’t render properly. For example, the “script” and “iframe” tags.

XHTML is based on XML. Whether an element allows content, requires content, or must be empty is not specified by XML, but by the schema associated with that document type. In this case, the HTML spec tells you whan an element can and cannot contain.

In HTML the link element is always self-closing, no need for the slash, so the parser looks at one tag and creates an element from it.

In XHTML which is XML, you must close the link element. And it must be empty, no content is allowed. Since the HTML parser ignores the slash, it will parse most XHTML documents correctly.

But in RSS, the link element contains the URL of a post or a blog. An XML parser would treat that correctly, putting any content beween the opening and closing tags into the element.

But an HTML parser would create an element from the link tag itself, and then parse the following content as part of the parent element. So you can’t parse an RSS feed with an HTML parser and except a correct document structure.

Oh, but I’m just talking about about overriding it in that specific TestCase where you’re only interested in asserting against XML.