Archive for July, 2007

Compiling queries without EVAL

July 27, 2007

Gary King recently blogged a question about whether a use of EVAL he had recently made in some Common Lisp code was legit. This caught my eye because I suspect this question came up while Gary was working on a project that I once worked on. Anyway, since he asked, the short answer is, “No. The old rule that if you’re using EVAL you’re doing something wrong is still true.” A slightly longer answer follows.

The problem Gary set out to solve is that he’s got a a bunch of chunks of text coming out of some sort of database and he wants to query them with logical expressions like this one:

 (and "space" "mission")
 (and (or "moon" "lunar") "landing"))

These expressions are obviously not Lisp but rather a query language where the logical operators AND and OR (and presumably other ones like NOT) have their usual logical meaning and literal strings should evaluate to true if the text we’re matching against testing contains the string and false if not. Thus the expression above would match text containing the words “space” and “mission” or either of “moon” or “lunar” along with “landing”. Gary considered, and rejected, writing “a simple recursive evaluator to deal with ands and ors”. Instead he wrote some code to munge around expressions like the one above into a form that he could EVAL. In other words, instead of writing his own interpreter he just munged his code into a form that Lisp could interpret for him. Which is okay. Except for the rule that if you’re using EVAL you’re doing something wrong.

What he should have done is written a compiler. Luckily Common Lisp comes with a Lisp compiler built in so all we have to do to write our own compiler is translate our source language into Lisp. Here’s how I’d do it.

First define a function that takes a query expression and returns two values, an equivalent expression with all string literals replaced with GENSYM’d variable names, and an alist of the strings and the corresponding GENSYM’d names. In this phase we also detect if the same literal string appears more than once so we only generate a single binding for each unique string.

(defun translate (tree &optional bindings)
  (typecase tree
     (multiple-value-bind (newcar newbindings)
         (translate (car tree) bindings)
       (multiple-value-bind (newcdr newbindings2)
           (translate (cdr tree) newbindings)
         (values (cons newcar newcdr) newbindings2))))
    (symbol (values tree bindings))
     (let ((binding (assoc tree bindings :test #'equal)))
         (binding (values (cdr binding) bindings))
          (let ((sym (gensym)))
            (values sym (acons tree sym bindings)))))))))

With this function we can translate an expression like:

(or (and "a" "b") (and (or "c" "d") (or "a" "b")))

Into this expression:

(OR (AND #:G1 #:G2) (AND (OR #:G3 #:G4) (OR #:G1 #:G2)))

and this list of bindings:

(("d" . #:G4) ("c" . #:G3) ("b" . #:G2) ("a" . #:G1))

Now we just need to use those two values to build up a bit of Lisp. Gary’s solution was to build up an expression that he could EVAL. But it’s better to generate a LAMBDA expression because then we can compile it. Here’s the function:1

(defun make-lambda-expression (expr)
  (multiple-value-bind (tree bindings) (translate expr)
    (let ((string (gensym)))
      `(lambda (,string) 
         (let (,@(loop for (word . sym) in bindings collect
                      `(,sym (find-word-in-string ,word ,string))))

If we pass the same expression to this function we get the following lambda expression back.

(LAMBDA (#:G9)
  (LET ((#:G8 (FIND-WORD-IN-STRING "d" #:G9))
        (#:G7 (FIND-WORD-IN-STRING "c" #:G9))
        (#:G6 (FIND-WORD-IN-STRING "b" #:G9))
        (#:G5 (FIND-WORD-IN-STRING "a" #:G9)))
    (OR (AND #:G5 #:G6) (AND (OR #:G7 #:G8) (OR #:G5 #:G6)))))

We could use FUNCALL to evaluate this expression which would at least get us out of using EVAL.2 But the real advantage of this approach is that we can compile this expression. Since Gary said he wanted to find all the strings in his database that match a given expression, he’s probably going to be evaluating his query once per string in his database. In that case it’s probably worth it to take a small up front hit in order to speed up the execution of the query since we’re going to be executing it many times. Luckily compiling a lambda expression is about as trivial as EVALing any other expression:

(defun compile-expression (expr)
  (compile nil (make-lambda-expression expr)))

This function, fed a query expression, returns a compiled function that takes a single string argument and returns true if the query expression matches and false if not. On Lisps with native compilers the returned function will be compiled down to machine code just the same as if we had written it by hand in our source code. We can FUNCALL this function such as in this code that collects all the strings returned by a cursor function that match the query expression:

(defun query-strings (query database)
  (loop with predicate = (compile-expression query)
     with cursor = (string-cursor database)
     for string = (next-string cursor)
     while string
     when (funcall predicate string) collect string))

We can also use the query function with all of Lisp’s higher-order-functions, such as REMOVE-IF-NOT:

(remove-if-not (compile-expression query) *all-strings*)

Now, one could argue that there’s a whole heck of a lot of difference between using EVAL and wrapping something in a lambda expression and compiling it — in both cases you can generate, and then evaluate, arbitrary Lisp code. But there is an important difference, namely that EVAL just evaluates — it takes some data and interprets it as Lisp and gives you an answer straight away whereas compiling a lambda expression gives you a function, something that can interact with the rest of your code, as an argument to higher-order functions, and so on. Or, if you don’t buy that, at least you avoid breaking the no EVAL rule.

Update: It hit me as I was brushing my teeth that while nicely avoiding EVAL, my first solution had a big problem — because all the FIND-WORD-IN-STRING calls are done in the LET before the boolean expression is evaluated we lose all the advantage of AND and OR’s short-circuiting behavior. That is, the generated code searches for all the strings and then combines the results of all those searches. Much better (and simpler) would be to implement TRANSLATE and MAKE-LAMBDA-EXPRESSION this way:

(defun translate (tree &optional (stringvar (gensym)))
  (typecase tree
     (cons (translate (car tree) stringvar) (translate (cdr tree) stringvar)))
    (symbol tree)
    (string `(find-word-in-string ,tree ,stringvar))))

(defun make-lambda-expression (expr)
  (let ((string (gensym)))
    `(lambda (,string) ,(translate expr string))))

This has the slight disadvantage that if the same literal string appears in the pattern more than once, we will potentially search for it more than once. But that’s probably much less of an issue than the problem this code fixes. Obviously if we cared to, we could generate code that caches searches once they are done to get the best of both worlds. But I’ll leave that as an exercise for the reader.

1. This function generates calls to a helper function FIND-WORD-IN-STRING. Gary’s version was somewhat more complex but for the purposes of illustration this definition should do:

(defun find-word-in-string (word string)
  (search word string :test #'char-equal))

2. Gary’s solution was haired up a bit because he didn’t simply generate a LET form to EVAL but instead generated just the boolean expression and then wrapped the call to EVAL in a PROGV to establish dynamic bindings at run-time. Which, while sort of clever, was another sign from the gods that he had gone down a wrong path somewhere.

Coders at Work questions list

July 18, 2007

As I mentioned the other day I’ve started working on a set of general questions for my Coders at Work interviews. They are now up on the Coders at Work web site. Some of them are, no doubt, dumb and there may still be some redundancies in the list but it’s a start. As always I’m interested in what other folks think. Are there questions you’d like me to ask some or all of the programmers on my list?1 Email them to me or leave a comment on the Coders at Work comment page.

1. Technically, that’s not really my list; that’s the combined wisdom of the 150+ people who have sorted the complete list of names. Since I’ve seen a number of comments about this list, I should point out that while it’s useful (and fascinating) for me to see how other people rank the programmers on my list, in the end I’ll be making the final decision of who to approach for interviews. So don’t get too upset if you look at that list and see someone way too high or too low for your tastes. If there’s someone you’d really like to see interviewed the best thing to do is to let me know why you think they’d be so interesting to hear from.

Latest, greatest Coders at Work name sorter

July 17, 2007

It’s been a while since I’ve posted anything about Coders at Work. Last week I put up, but did not broadly announce, a new, improved web page for sorting the nearly 300 suggested names of people to interview I’ve received. I mostly created it for my own benefit but anyone who wants to play with it can feel free to make their own sorting. (I’m mostly announcing it now because I think it’s sort of a nifty UI — much more fun to play with that the previous one — though I am also interested to see how other folks would sort the list of names I’ve got. Unfortunately, like the previous sorter, this page probably doesn’t work in IE and is only known to work for sure in Firefox.) One feature of this page that the other short list selector didn’t have, is the ability to effectively down vote names — you can put folks who you are particularly uninterested in hearing from at the bottom of the list and that tells me something.

If you think you’ve got a sorting that would make for a great book, click the Save button then “Permalink” on the next page to get a URL for your specific sorting and send it to me. I’m also still interested in new suggestions of folks to interview though as I’m getting pretty close to starting to contact folks and since I’m ultimately only going to interview on the order of sixteen people, new names have to be pretty obviously better than over 250 of the names I’ve already got to break into serious contention. As always, I’m also interested in your comments about why you think someone would (or wouldn’t) make a good interview subject.

I’ve also started serious work on a set of general questions. So far I’ve got 191 questions in 22 categories. Which is a lot — obviously I’ll have to pare that down unless I’m going to interview people for 10 or 20 hours. And that’s not even counting questions more directly tailored to the individual subjects. But it’s still better to have too many than too few so if you’ve got a question you’d like me to ask either everyone I interview or specific people, feel free to email or leave it on the Coders at Work comment page.

Update: I didn’t mention before but you can see the combined results of everyone who has submitted a sort via this new page here.

In theory, practice is no different from theory, in practice …

July 12, 2007

Yesterday I was taking my 10-month old daughter to our parent/infant swim class at the Berkeley YMCA. She happened to be wearing a blue shirt. The woman riding down with us in in the elevator from the parking garage asked how old she was and said, “Oh, what a cute little boy.”

“Girl,” I said.

“Oh, sorry!”

“No worries. It’s the shirt. And the short hair.”

“I know, we’re all so color coded.”

As we were getting out of the elevator she said, “You know, I should be the last person in the world to do that … I teach feminist theory.”

Categorizing potential Coders at Work interview subjects

July 7, 2007

Since I put up the short list submission page1 for Coders at Work a week and a half ago, I’ve received 162 short lists of sixteen names. You can see which programmers appear on the most short lists on this page. Thank you to everyone who took the time to submit a list.

For the next step in my selection process I’ve put together a page on which I’ve lumped the potential interview subjects into various categories such as “Old school Unix hackers”, “New school Unix hackers”, “Language designers”, and so forth. My theory is that it’ll be more interesting to read interviews with different kinds of programmers than a bunch of interviews with people who’ve all done the same basic kind of work. It’s also a useful way to identify potential interviewees — I can look at each category and ask, is there someone not on this list who’d be a better representative of the category?

As usual, there are ways you can help me out, if you’re so inclined. I enumerate them on the categorization page but basically they are to send me email or leave a comment nominating someone as the best representative of a category, helping me categorize the folks I’ve already got, and suggesting interesting categories that I don’t have yet.

1. I’ve still not been able to get this page working in IE, mostly due to the lack of any easy way to run IE myself. If any Javascript guru wants to let me know what changes I need to make to my code to make it work in IE I’ll be forever grateful.