The maze book for programmers!
mazesforprogrammers.com

Algorithms, circle mazes, hex grids, masking, weaving, braiding, 3D and 4D grids, spheres, and more!

DRM-Free Ebook

The Buckblog

assorted ramblings by Jamis Buck

Designing a DSL

13 November 2006 — 4-minute read

So, as I’ve mentioned a few times recently, I’ve been working on a logic engine for Ruby, using a Ruby-ish DSL. It’s working quite nicely now, and is able to do the examples that Mauricio so kindly provided, including the Tower of Hanoi solver, and the symbolic differentiator.

The exercise has been fun, as well as very educational. For the curious, you can check it out at http://svn.jamisbuck.org/projects/logic_engine. For now, I’d like to talk a bit about how I implemented the DSL, since that’s probably of more general interest.

First, I “sketched it out”. Not on paper, but in a blank text document. This let me define how I wanted the DSL to look, and also proved as a good thought experiment, letting me try things out in a very low-risk environment.

My first pass, as I blogged about here, used methods on Symbol and String:

1
2
3
4
5
6
:X.sibling_of(:Y).if :Z.parent_of(:X).and(:Z.parent_of(:Y)).and(:X.noteq(:Y))
...
"matz".father_of "Ruby"
...
result = :X.sibling_of("Sally").solutions
result.each { |solution| p solution }

However, this required the the Symbol and String namespaces to be “polluted” by method_missing hooks and so forth. It also was hard to make that work with multiple knowledgebases in the same program, since the method_missing hook for both String and Symbol would need to know which database the predicate was referencing.

So, that first pass defined a few boundaries for me. I knew that I wanted multiple databases, and that terms needed to be defined within the context of a specific database. My second attempt looked something like this:

1
2
3
4
5
6
7
db = Database.new do
  X.sibling_of(Y).if Z.parent_of(X).and(...).and(...)
  matz.father_of(ruby)
  ...
end

result = db.query { X.sibling_of(sally) }

Ugh, though. That one required const_missing to be captured, and that can only be captured at the class level, not the object instance level. How would I be able to access the current DB instance inside the const_missing hook? That seemed like it could be tricky, so I tried to work around the need for it. Instead, I considered using conventions like if a method is invoked that starts with an underscore, it is a variable>

1
2
3
4
db = Database.new do
  _X.sibling_of(_Y).if _Z.parent_of(_X).and(...).and(...)
  ...
end

But that, frankly, made my eyes bleed. I also considered using punctuation, like X! and Y! to indicate variables, but that was far too noisy. In the end, I really wanted Ruby constants to indicate variables in the assertions.

Fortunately, it wasn’t so bad to implement. It only required a minor bit of “pollution” in the Module-level const_missing hook, and I tried to minimize the damage there by chaining my handler to the existing one. Then, I made sure to set the current DB instance in a thread-local variable, which I could access from the const_missing hook. (This assumes that a single thread will never be processing more than one database at a time, which is a fair assumption, I think.) You can see my implementation of the const_missing hook here.

So, hurdle hurdled. The next problem was that using and to create a conjunction was rather ugly, and verbose. Fortunately, Ruby lets you override certain operators, and the ampersand is one of them:


X.sibling_of(Y).if Z.parent_of(X) & Z.parent_of(Y) & X.noteq(Y)

By the time I got this far, I had actually begun to implement the DSL, so I could see how technically feasible (or infeasible) some of these constructs really were. The first thing I discovered was that there was no way for me to know, within the assertion block, which terms were being added to the database:

1
2
3
4
5
6
7
db = Database.new do
  X.is_parent_of(Y).if X.is_father_of(Y)
  X.is_parent_of(Y).if X.is_mother_of(Y)
  jamis.is_father_of(nathaniel)
end

result = db.query { jamis.is_parent_of(nathaniel) }

Give the above, I’d have to use some more flags to know whether a statement was created at the “top-level”, so to speak, or whether it was created as a child of another statement…and then I’d have to know whether it was safe to add the statement to the database, since when you query, you don’t want that term added.

The simplest thing to do was to require that something be explitly called to add a new term to the database. Using the unary plus operator was the least verbose way I could think of:

1
2
3
db = Database.new do
  + X.sibling_of(Y).if Z.parent_of(X) & ...
end

The problem, though, is that this causes a compile error! It won’t work without parentheses around the arguments to if, and that’s ugly. So! Think think think.

I solved the problem by using blocks. Instead of passing the implied terms as parameters to if, I define them in a block that gets attached to if:

1
2
3
db = Database.new do
  + X.sibling_of(Y).if { Z.parent_of(X) & ... }
end

I actually think I like that better! It’s clearer what is going on, and opens up some interesting possibilities of multiple statements occuring inside that if block. It also looks more like a control structure, which is a nice side effect.

Under the hood, I’m using the “clean room” DSL technique. The block passed to Database.new is actually evaluated in the context of a “blank” object, a “clean room”, that just proxies the data back to the database. This way, method missing calls all just happen on the clean room, and I don’t have to worry about the methods on my Database class shadowing the names of atoms and predicates and such.

It’s turned out pretty nicely. The matching and backtracking implementation is certainly not optimized yet, and a LOT of objects get created right now, but it works! Feel free to check it out at check it out at http://svn.jamisbuck.org/projects/logic_engine. I’d love to hear what you think of it, thus far.

Reader Comments

I may be missing something but can’t you use an anonymous class as as a trampoline/namespace for module_evalling your database definition. Have const_missing create class variables in the trampoline.

Also, once you’ve moved to passing a block to if, you’ve eliminated the problem of eager evaluation, which means you can get rid of the unary +. Have if create a new anonymous subclass of the current namespace and module_eval the condition block in that. Class variable scoping should be your friend at that point.

Piers, it could be that there is a way to get rid of the unary +, but I still don’t see it. The block for the if is not for deferral (in fact, I call it immediately, so there’s no deferral going on at all).

Regarding const_missing, I actually tried that. I put const_missing in the cleanroom class that was evaling the definition, but it never got called. I can only assume that’s because I’m using a block instead of a text. I didn’t experiment too much, just enough to hit on overriding the Module’s const_missing.

Feel free to play with my code if you want. I’m curious to see some of these ideas in action.

You currently calling the block immediately, but you don’t have to do it. There is nothing to stop your ‘if’ implementation from looking something like:

def if(&block) ... Class.new(current_environment) &block ... end

I shall go away and play…

Piers, sorry, my previous reply was kind of distracted and I omitted what I meant to say. There may very well be a way to make do without the explicit unary plus, but it would require a lot of complexity under the covers. I kind of like the unary plus, though—it makes it clear that the statement is being added to the database, and it acts like a nice bullet point when you have multi-line statements.

That said, I certainly don’t want to discourage you from playing with the code. I’d love to see anything you come up with!

So, I experimented with:

top_level = class.new
begin
  p "About to eval" 
  top_level {L}
  p "Evaluation successful" 
rescue NameError e
  eval %{top_level::#{e.name.to_s} = Class.new}
  retry
end

However, when I ran it, I went into an infinite loop, which kept throwing ‘already initialized constant L’ warnings. Changing the eval in the rescue to:

eval %{#{e.name.to_s} = Class.new }

What seems to be happening is that constant lookup is independent of any Binding objects, but is always based the block’s lexical scope. Which smells like a ruby bug to me.

Thanks, Piers, that confirms what I saw as well. I should note that I’m still running ruby 1.8.4 locally. Were you testing on 1.8.5?

Jamis, i’ve been checking your blog for this project and i’m really enjoying. Its far one of the most interesting DSL i’ve seen built in Ruby. It shows us how powerful ruby is and how far we’re from reaching ruby’s bondaries ;)

Keep improving it!

I’m running 1.8.4 here.

Rodrigo: It’s weird, there’s a strong Ruby culture of meta programming, and there’s some nicely abstracted ways of monkeying with it. Meanwhile Perl 5, which is where I come to Ruby from, has some truly horrible ways of getting at the sort of things Ruby gives you, but once you’ve wrapped your head around it, you can actually do more with Perl than you can with Ruby. For instance, I’ve implemented the extract method refactoring in pure Perl, but I’m still damned if I can work out how to do it in Ruby.

Actually, Smalltalk’s introspection stuff is probably the right place for Ruby to steal from (All I want for Christmas is a version of Binding I can introspect on and a way of getting either continuations (or failing that) bindings back from caller).

Have you thought about OWL integration? Either at the input end of things (enabling importation of description logics built using something like Protege) or the output end (enabling the construction of description logics using a nice ruby-like syntax, and then executing logical queries against them using a third-party reasoner like Racer).