How should Ribs definitions look?


I am currently sitting with Ribs, trying to decide how the definitions for associations should look like. I’ve also decided that I don’t want to use the current scheme for properties either. I also have some interesting constraints. First, I do not want to evaluate the block using instance_eval, so introducing nice method_missing hooks in the block context is out (so I can’t do anything like “has n” as DataMapper). Also, all definitions need to be able to take additional parameters in a hash. So something like “rib.has * :Posts” won’t work either.

I have come up with something that I’m reasonably happy with. It’s quite terse but definitely readable. It doesn’t pollute any namespaces. And hopefully it will only be used when the model need to differ from the conventions. I’m thinking about taking conventions a step further with Ribs and automatically try to figure out associations based on naming and foreign keys – but I haven’t decided about that yet.

All of these examples are supposed to be executed inside the Ribs! block, with r being the Rib parameter. I’ve decided to allow the table name as a parameter to the Ribs! method call instead of being something you set inside.

First of all, setting which property/properties are a primary key, I’m thinking about one or more of these alternatives – where track_id is the property name:

r.track_id :primary_key
r.track_id :primary_key => true
r.track_id = :primary_key
r.track_id.primary_key!

I’m skittish about the track_id= alternative, but the others I think can coexist quite well.

The next thing is to handle the mapping from one property name into a different column name (if no column name is specified, the property name will be used, of course):

r.title :column => :track_title
r.title.column = :track_title

Nothing very extraordinary here. I would probably like to have both alternatives here.

The next one is more interesting. I haven’t yet decided how to handle specifying primitive types. This code is supposed to handle the case that in ActiveRecord are covered with has_one and belongs_to. I don’t really see why they have to be different, really, so I’m thinking about doing something like this:

r.blog :is_owner
r.blog :is => Blog

I’m skittish about :is_owner, but it also feels slightly icky to have to specify the type of all single associations. This is definitely an area I would like to have some good ideas about.

Finally we come to the one-to-many and many-to-many associations. What makes this different is that I can have both a Set and a List (and also Map) since Hibernate handles these for me:

r.comments :list_of => Comment
r.comments :set_of => Comment
r.comments :is_set
r.comments :is_list

Once again I’m not sure how to handle it in such a way that you don’t have to write the name of the other entity in all definitions.

Anyway, this is what I currently have. I would appreciate innovative ideas here – especially if they are innovative within the constraints I’ve set for myself.



Short term adoption and Ruby DSLs


I feel like I’ve been picking a bit on DataMapper in my recent posts, so I first want to say that I like most of what I see in DataMapper, and is generally just nitpicking. I’m also what I’ve noticed there to give examples instead of making up something. And actually, one of the reasons is that it was some time since I read much Ruby code, and DataMapper happens to be one of the things I’m looking a lot at right now.

Which brings me to something Sam Smoot said in the discussion about adding operator methods to Symbol. (In the blog post Simplified finders in DataMapper). What he said was this:

That might satisfy some, but it ranks pretty low on the “beauty” gauge. And that really does matter for adoption…

Now, the reason I’m bringing it up here is not to pick on Sam or DataMapper (Really!). Rather, it’s because an attitude that I’ve noticed all over the Ruby world lately. In most cases it’s not explicitly said like this, but I like Sam’s quote because it verbalizes exactly what it’s about. You really want your library to look nice. Beauty is really important in a Ruby framework. It really is important, and lots of focus is put on it. And one of the reasons for this is to drive adoption.

Since it’s been verbalized like this, I’ve realized that this is one of the things that really disturbs me with the Ruby communities fascination with making everything, absolutely everything into a DSL.

Now, I really do like thinking about DSLs and working with them. Being a language implementor brings that out in me. But a DSL really has it’s place. Why shouldn’t you make everything into a DSL? Why shouldn’t everything look nice? First of all, the initial development might just not be worth it. A DSL-like approach generally involves more effort, so maybe a regular API works fine? A DSL introduces cost, both in development and in maintenance. Everything you do needs to be balanced. My coworker Marcus Ahnve gave a very good example of this today. He wanted to use ctags with an RSpec file, but since ctags doesn’t understand “describe” and “it”, there was no way for it to extract any useful information from it. Now, this is a trade of the framework designer has to make, and in the case of RSpec I think that the DSL is totally justified, but in other cases it’s not.

So take the issue at hand. Adding a few methods to Symbol, so that finding information in a database will look a bit nicer. In real terms this involves polluting the Symbol namespace with methods that have a very narrow scope. The usage (that looks more or less like this: Exhibition.all(:run_time.lt => 3)) uses something that reads easily from an English/SQL perspective, but not necessarily from a Ruby perspective. The main question I have when seeing this is what the eventual cost down the line will be for having it. Is this approach expandable? Is there any possibility of clashes with other libraries? Will someone else easily maintain it?

Focusing on beauty to generate adoption seems like the wrong choice for me. You add things that might not be so good long term, to get a larger initial code base. I would prefer to make the right choices for the right reasons first, and then let adoption be driven by that.

In summary, be responsible when designing APIs. I know those are very boring words. But it’s a reality, and your users will thank you for it.

PS: Sam just wrote a new comment, saying that the Symbol operators will be optional in the next DataMapper release. Good choice.



Language generation


This post is the first in a series of posts about PAIPr. Read here for more info about the concept.

Today I would like to start with taking a look at Chapter 2. You can find the code in lib/ch02 in the repository.

Chapter 2 introduces Common Lisp through the creation of a series of ways of doing generation of language sentences in English, based on simple grammars. It’s an interesting chapter to start with since the code is simple and makes it easy to compare the Ruby and Common Lisp versions.

The first piece to take a look at is the file common.rb, which contains two methods we’ll need later on:

require 'pp'

def one_of(set)
  [set.random_elt]
end

class Array
  def random_elt
    self[rand(self.length)]
  end
end

As you can see I’ve also required pp, to make it easier to print structures later on.

Both one_of and Array.random_elt are extremely simple methods, but it’s still nice to have the abstraction there. I’m retaining the naming from the book for these two methods.

The first real example defines a grammar by directly using methods. (From simple.rb):

require 'common'

def sentence; noun_phrase + verb_phrase; end
def noun_phrase; article + noun; end
def verb_phrase; verb + noun_phrase; end
def article; one_of %w(the a); end
def noun; one_of %w(man ball woman table); end
def verb; one_of %w(hit took saw liked); end

As you can see, all the methods just define their structure by combining the result of more basic methods. A noun phrase is an article, then a noun. An article is either ‘the’ or ‘a’, and a noun can be ‘man’, ‘ball’, ‘woman’ or ‘table’. If you run sentence a few times you will see that you sometimes get back quite sensible sentences, like [“a”, “ball”, “hit”, “the”, “table”]. But you will also get less interesting things, such as [“a”, “ball”, “hit”, “a”, “ball”]. At this stage the space for variation is quite limited, but you can still see a simplified structure of the English language in these methods.

To create an example that involves some more interesting structures, we can introduce adjectives and prepositions. Since these can be repeated zero times, or many times, we’ll use a production called PP* and Adj* (pp_star and adj_star in the code). This is from simple2.rb:

require 'simple'

def adj_star
  return [] if rand(2) == 0
  adj + adj_star
end

def pp_star
  return [] if rand(2) == 0
  pp + pp_star
end

def noun_phrase; article + adj_star + noun + pp_star; end
def pp; prep + noun_phrase; end
def adj; one_of %w(big little blue green adiabatic); end
def prep; one_of %w(to in by with on); end

Nothing really changes here, except that in both the optional productions we randomly return an empty array 50% of the time. They then call themselves recursively. The noun phrase production also changes a bit, and adj and prep gives us the two new terminals needed. If you try this one, you might get some more interesting results, such as: [“a”, “table”, “took”, “a”, “big”, “adiabatic”, “man”]. It’s still nonsensical of course. And it seems that this approach with randomness generates quite large output in some cases. To make it really nice there should probably be a diminishing bias in the adjectives and prepositions based on the length of the already generated string.

Another problem with this approach is that it’s kinda unwieldy. Using methods for the grammar is probably not the right choice long term. More specifically, we are tied to this implementation by having the grammar be represented as methods.

A viable alternative is to represent everything as a grammar definition – using a rule based solution. The first part of rule_based.rb looks like this:

require 'common'

# A grammar for a trivial subset of English
$simple_grammar = {
  :sentence => [[:noun_phrase, :verb_phrase]],
  :noun_phrase => [[:Article, :Noun]],
  :verb_phrase => [[:Verb, :noun_phrase]],
  :Article => %w(the a),
  :Noun => %w(man ball woman table),
  :Verb => %w(hit took saw liked)}

# The grammar used by generate. Initially this is $simple_grammar, but
# we can switch to other grammars
$grammar = $simple_grammar

Note that I’m using double arrays for the productions that aren’t terminal. There is a reason for this that will be more pronounced in the later grammars based on this. But right now it’s easy to see that a production is either a list of words, or a list of list of productions. Production names beginning with a capital is a terminal – this is a convention in most grammars. I didn’t use capital letters for the terminals when using methods because Ruby methods named like that causes additional trouble when calling them.

Now that we have the actual grammar we also need a helper method. PAIP defines rule-lhs, rule-rhs and rewrites, but the only one we actually need here is rewrites. (From rule_based.rb):

def rewrites(category)
  $grammar[category]
end

And actually, we could do away with it too, but it reads better than an index access.

The final thing we need is the method that actually creates a sentence from the grammar. It looks like this:

def generate(phrase)
  case phrase
  when Array
    phrase.inject([]) { |sum, elt|  sum + generate(elt) }
  when Symbol
    generate(rewrites(phrase).random_elt)
  else
    [phrase]
  end
end

If what we’re asked to generate is an array, we generate everything inside of that array, and combine them. If it’s a symbol we know it’s a production, so we get all the possible rewrites and take a random element from it. Currently every production have one rewrite, so the random_elt isn’t strictly necessary – but as you’ll see later it’s quite nice. And finally, if phrase is not an Array or Symbol, we just return the phrase as the generated element.

I especially like the use of inject as a more general version of (mappend #’generate phrase). Of course, for readability it would have been possible to implement mappend too:

def mappend(sym, list)
  list.inject([]) do |sum, elt|
    sum + self.send(sym, elt)
  end
end

But I choose to use inject directly instead, since it’s more idiomatic. Note that this version of mappend doesn’t work exactly the same as Common Lisp mappend, since it doesn’t allow a lambda.

Getting back to the generate method. If you were to run generate(:sentence), you would get the same kind of output as with the method based version – with the difference that changing the rules is much simpler now.

So for example, you can use this code from bigger_grammar.rb, which creates a larger grammar definition and then sets the default grammar to use it:

require 'rule_based'

$bigger_grammar = {
  :sentence => [[:noun_phrase, :verb_phrase]],
  :noun_phrase => [[:Article, :'Adj*', :Noun, :'PP*'], [:Name],
                   [:Pronoun]],
  :verb_phrase => [[:Verb, :noun_phrase, :'PP*']],
  :'PP*' => [[], [:PP, :'PP*']],
  :'Adj*' => [[], [:Adj, :'Adj*']],
  :PP => [[:Prep, :noun_phrase]],
  :Prep => %w(to in by with on),
  :Adj => %w(big little blue green adiabatic),
  :Article => %w(the a),
  :Name => %w(Pat Kim Lee Terry Robin),
  :Noun => %w(man ball woman table),
  :Verb => %w(hit took saw liked),
  :Pronoun => %w(he she it these those that)}

$grammar = $bigger_grammar

This grammar includes some more elements that make the output a bit better. For example, we have names here, and also pronouns. One of the reasons this grammar is easier to use is because we can define different versions of the productions. So for example, a noun phrase can be the same as we defined earlier, but it can also be a single name, or a single pronoun. We use this to handle the recursive PP* and Adj* productions. You can also see why the productions are defined with an array inside an array. This is to allow choices in this grammar.

A typical sentence from this grammar (calling generate(:sentence)) gives [“Terry”, “saw”, “that”], or [“Lee”, “took”, “the”, “blue”, “big”, “woman”].

So it’s easier to change these rules. Also believe that it’s easier to read, and understand the rules here. But one of the more important changes with the data driven approach is that you can use the same rules for different purposes. Say that we want to generate a sentence tree, which includes the name of the production used for that part of the tree. That’s as simple as defining a new generate method (in generate_tree.rb):

require 'bigger_grammar'

def generate_tree(phrase)
  case phrase
  when Array
    phrase.map { |elt| generate_tree(elt) }
  when Symbol
    [phrase] + generate_tree(rewrites(phrase).random_elt)
  else
    [phrase]
  end
end

This code follows the same pattern as generate, with a few small changes. You can see that instead of appending the results from the Array together, we instead just map every element. This is because we need more sub arrays to create a three. In the same manner when we get a symbol we prepend that to the array generated. And actually, at this point it’s kinda interesting to take a look at the Lisp version of this code:

(defun generate-tree (phrase)
  (cond ((listp phrase)
         (mapcar #'generate-tree phrase))
        ((rewrites phrase)
         (cons phrase
               (generate-tree (random-elt (rewrites phrase)))))
        t (list phrase)))

As you can see, the structure is mostly the same. I made a few different choices in representation, which means I’m checking if the phrase is a symbol instead of seeing if the rewrites for a symbol is non-nil. The call to mapcar is equivalent to the Ruby map call.

What does it generate then? Calling it with “pp generate_tree(:sentence)” I get something like this:

[:sentence,
 [:noun_phrase, [:Name, "Lee"]],
 [:verb_phrase,
  [:Verb, "saw"],
  [:noun_phrase,
   [:Article, "the"],
   [:"Adj*",
    [:Adj, "green"],
    [:"Adj*"]],
   [:Noun, "table"],
   [:"PP*"]],
  [:"PP*"]]]

which maps neatly back to our grammar. We can also generate all possible sentences for a grammar without recursion, using the same data driven approach.

The code for that can be found in generate_all.rb:

require 'rule_based'

def generate_all(phrase)
  case phrase
  when []
    [[]]
  when Array
    combine_all(generate_all(phrase[0]),
                generate_all(phrase[1..-1]))
  when Symbol
    rewrites(phrase).inject([]) { |sum, elt|  sum + generate_all(elt) }
  else
    [[phrase]]
  end
end

def combine_all(xlist, ylist)
  ylist.inject([]) do |sum, y|
    sum + xlist.map { |x| x+y }
  end
end

If you run generate(:sentence) you will get back a list of all 256 possible sentences from this simple grammar. In this case the algorithm is a bit more complicated. It’s also using the common Lisp idiom of working on the first element of a list and then recur on the rest of it. This makes it possible to combine everything together. I assume that it should be possible to devise something suitably clever based on the new Array#permutations or possible Enumerable#group_by or zip.

It’s interesting how well the usage of mappend and mapcar maps to uses of inject and map in this code.

Note that I’ve been using globals for the grammars in this implementation. An alternative that is probably better is to pass along an optional parameter to the methods. If no grammar is supplied, just use the default constant instead.

Anyway, the code for this chapter is in the repository. Play around with it and see if you can find anything interesting. This code is definitely an introduction to Lisp, more than a serious AI program – although it does show the kind of approaches that have been used for primitive code generation.

The next chapter will talk about the General Problem Solver. Until then.



Simplified finders in DataMapper


It seems I’m spending lots of time reading about DataMapper currently, while planning the first refactoring of Ribs. While reading I keep seeing Sam’s examples of simplified finders compared to the ActiveRecord versions. A typical example of the kind of thing I’m talking about is something like this with ActiveRecord:

Exibition.find(:all, :conditions => ["run_time > ? AND run_time < ?", 2, 5])

Which with DataMapper would be:

Exibition.all(:run_time.gt => 2, :run_time.lt => 5)

Oh, and yeah, it the typo is directly from the DataMapper documentation. So what’s wrong here? Well, in most cases it probably does exactly what you want it too. And that’s the problem in itself. If you use these simplified finders with more than one argument, you will get subpar SQL queries in some cases. Simply, if you don’t control the order of the clauses separated by AND, you might get queries that perform substantially worse than they should. Of course, that doesn’t happen often, but it’s important to keep it in mind.

And incidentally, I really do hate the methods added to Symbol.



Upcoming talks


It’s been some quiet months actually. But now that’s over. The next three weeks I’ll be showing up in Oslo, Santa Clara and Aarhus:

  • 17-18 Sep, JavaZone in Oslo, Norway. Me and Charles will present a JRuby Smorgosbord. I hope that we will be able to show the audience lots of compelling reasons to take a look at JRuby.
  • 24-26 Sep, JVM Language Summit in Santa Clara, California. This will probably be the geekiest conference I’ll ever attend. Around a 100 language implementation geeks, all come together to talk about language implementations for the JVM. Wow. I’ll be doing a talk about Jatha, (a Common Lisp for Java), from the perspective of the kind of language that represents most of the languages on the 200+ list.
  • 28 Sep-3 Oct, JAOO, Aarhus, Denmark. This will be my first visit to JAOO. Looking forward to it – people say it’s supposed to be great, and I’ve been quite impressed with the QCon conferences I’ve been to. I’ll be doing two talks at JAOO, one about JRuby and one about Oracle Mix.

Hope to see you there.



Paradigms of Artificial Intelligence Programming (in Ruby)


Since I never can get enough projects, I’ve decided to start on something I’ve thought about doing for a long time. There are several reasons for doing it, but foremost among them is that I want to get back to playing with AI, and I want to have a project with many small pieces that I can do when I have some time over. If it ends up being educational or useful for others at the same time, well, I’m not complaining!

So what is it?

Well, first I’d like to introduce a book. It’s called Paradigms of Artificial Intelligence Programming or PAIP for short. It’s written by Peter Norvig, who’s also written several other books about AI. Currently he’s the Director of Research at Google. PAIP is probably both my favorite book about AI, and my favorite book about Common Lisp. It’s a really great book. Really. If you have any interest in one of those two subjects you should get hold of it. PAIP doesn’t cover cutting edge AI, but rather takes the historic view and looks at several examples from different eras, going from the first programs to some later, quite advanced things.

I’ve read it numerous times, went through the code and tweaked it and so on. It’s lots of fun. But that was a few years ago. So basically, what I want to do is to go through the book again. But this time I’m going to write all the programs in Ruby – converting them from Common Lisp and then maybe tweak them a bit to make them more idiomatic. And I’m planning to post it here. Or rather, I’m going to post the actual source code to http://github.com/olabini/paipr. I’m going to blog about all the code I write. You don’t necessarily have to have the book, since I’m going to surround the code with some descriptions and explanations.

Once again then, why should anyone care? Well, I don’t know. No one might care. But for me personally it’s going to be an interesting experience converting idiomatic Common Lisp into idiomatic Ruby. It’s going to be fun to revisit the old approaches to AI. And it might serve as a good, code heavy introduction to the subject for anyone interested in it.

I do have the permission from Peter Norvig to do this. The Ruby code I write is covered by the MIT license, while any Lisp code posted as part of this exercise is covered by the license here: http://norvig.com/license.html.

Also note that I sometimes won’t write the most obvious Ruby – it will be good to have a few links to the original Lisp code.



Evil hook methods?


I have come to realize that there are a few hook methods I really don’t like in Ruby. Or actually, it’s not the hook methods I have a problem with – it’s the way much code is written using them. The specific hooks that seems to cause the most trouble for me is included, extended, append_features and extend_features. Let me first reiterate – I don’t dislike the methods per se. The power they give the language is incredible and should not be underestimated. What I dislike is the way it makes things un-obvious when reading code that depends on them.

Let’s take a silly example:

module Ruby;end

module Slippers
  def self.included(other)
    other.send :include, Ruby
  end
end

class Judy
  include Slippers
end

p Judy.included_modules

Since all this code is in the same place, you can see what will happen when someone include Slippers. And really, in this case the side effect isn’t entirely dire. But imagine that this was part of a slightly larger code base. Like for example Rails. And the modules were spread over the code base. And the included hook did a few more things with your class. No way of knowing what – except reading the code – and the Ruby idiom is that include will add some methods and constants to your class and that is it. Anything else is going outside what the core message of that statement is.

One of the most common things you see with the included hook is something like this:

module Slippers
  module ClassMethods
  end

  def self.included(other)
    other.send :extend, ClassMethods
  end
end

class Judy
  include Slippers
end

This will add some class methods to the class that includes this module. DataMapper does this in the public API, for example. It’s very neat, you only have to include one thing and you get stuff on both your instances and your classes. Except that’s not what include does. Not really. So say you’re debugging someone’s code and happen upon an include statement. You generally don’t check what it’s doing until you’ve exhausted most other options.

So what’s wrong with having a public API like this?

module Slippers
  module ClassMethods;end
end

class Judy
  include Slippers
  extend Slippers::ClassMethods
end

where you explicitly include the Slipper module and then extend the class methods. This is more obvious code, it doesn’t hide anything behind your expectations, and it also might give me the possibility to choose. What if I want most of the DataMapper instance methods, but really don’t want to have finders on my class? Maybe I want to have a repository pattern? In that case I’ll have to explicitly remove all class methods introduced by that include, because there is no way of choosing if I want to have the class methods or not.

So that’s another benefit of dividing the extending out from the included hook. And finally, what about all those other things that people do in those hooks? Well, you don’t really need it. Make it part of the public API too! Instead of this:

module Slippers
  def self.included(other)
    do_funky_madness_on other
  end
end

class Judy
  include Slippers
end

make it explicit, like this:

module Slippers;end

class Judy
  include Slippers
  Slippers.do_funky_madness_on self
end

This is really just good design. It makes the functionality explicit, it makes it possible for the user to choose what he wants without doing monkey patching. And it makes the code easier to read. Yeah, I know, this will mean more lines of code. Booo hooo! I know that Ruby people are generally obsessed with making their libraries as easy to use as possible, but it feels like it sometimes goes totally absurd and people stop thinking about readability, maintainability and all those other things. And really, Ruby is such a good language that a few more lines of code like this still won’t make a huge impact on the total lines of code.

I’m not saying I haven’t done this, of course. But hopefully I’ll get better at it. And I’m not saying not to use these methods at all – I’m just saying that you should use them with caution and taste.



WordPress posting malfunctioning on Orange


Interesting. While on vacation in Paris I have discovered that Orange seems to mess with some of the POST requests going to my WordPress. That includes posting new posts through the user interface, posting comments, and also posting using MarsEdit. Very strange. If you have Orange and get timeouts for some posts for your WordPress, now you know you’re not alone.



Polyglot Programming thesis


A few months ago I was interviewed for a master thesis on Polyglot Programming by Hans-Christian Fjeldberg. This is now finished and available online. You can download it here: http://theuntitledblog.com/wp-content/uploads/2008/08/polyglot_programming-a_business_perspective.pdf



Ribs available from Gems


Ribs is now available from Gems. You can install it using this command:

jruby -S gem install ribs

Much pleasure, and do give your comments and ideas to me.