Ola Bini: Programming Language Synchronicity

May 9th, 2009

Message chains and quoting in Ioke

One of the more advanced features in Ioke is the ability to work with first class messages. At the end of the day, you are manipulating the AST directly by doing this, which means that you can do pretty much anything you want. The manipulation of message chains is the main way of working with macros in Ioke, so understanding what you can do with them is pretty important.

The documentation surrounding these pieces is spread all over the place, so I thought I’d take a look at messages and the way you construct and modify them.

Messages

The first step in working with message chains is to actually understand the Message. Message is the core data structure in Ioke, and it has some native properties that define the full structure of the Ioke AST. There are four pieces of the structure that is central to messages, and a few more that is less interesting. So let us look at the core structure. It is actually extremely simple. These are the things that makes a Message:

Name – all messages have a name. From the perspective of Ioke, this is a symbol. It will never be nil, but it can be empty.
Arguments – a list of messages, zero or more.
Prev – a pointer to the previous message in the chain, or nil if there is no previous.
Next – a pointer to the next message in the chain, or nil if there is no next message.

A message can also wrap a value. In that case the message will always return that value, and no real evaluation will happen. This can be used to insert any kind of value into a message chain that will later be evaluated. This is called wrapping.

A message chain is just a collection of messages, linked through their Prev and Next pointers.

The arguments to a message are represented as a list of messages. This make sense if you think about it for a few seconds.

OK, now you know what the Ioke AST looks like. It isn’t harder than that. Now, if you actually want to start working with messages, there are several messages that Message can receive, that allow you to work with them. The simpler ones (that I won’t explain closer) is “name”, “name=”, “arguments”, “arguments=”, “next”, “next=”, “prev”, “prev=”.

There are a few more interesting ones that merit some explanation. First, “last”. This message will just return the last message in the message chain. It is the equivalent of following the next pointer until you come to the end.

It’s important to keep in mind that Message is a mutable structure, which means you need to be careful to not change things that will give you unexpected changes. For example, if someone sends in a message, you shouldn’t generally actually modify that without copying it. Now, if you only want to copy a message without copying recursively the next pointer, you can just mimic it. Otherwise you use the method “deepCopy” which will actually copy both the next pointer and the arguments recursively.

Now, if you want to add new arguments to a message, you can use “appendArgument”. This method is aliased as “<<“. It will also return the receiver, so you can add several arguments by linking calls to appendArgument/<<. If you want to add a message at the beginning of the argument list, you instead use >>.

One of the more annoying things is that once you set the next pointer, you generally need to make sure to set the previous pointer of the next value too, unless you are setting it to nil. The same thing is true when setting the prev pointer. So, in the cases when you want to link two messages, you shouldn’t set these specifically, but instead use the “->” method. This allow you to link two Ioke messages. For example “msg1 -> msg2” will actually set the next pointer on msg1 and the prev pointer on msg2. If you do “msg1 -> nil” it will set the next pointer to nil.

And that’s basically it. If you need to actually evaluate the messages, you can either use “sendTo” or “evaluateOn”. The main difference here is that sendTo will actually not evaluate the message chain. It will only evaluate the message that is the receiver of the call. The evaluateOn method will follow the message chain and evaluate it fully, based on the context arguments given to it.

Oh, one last thing. To create new messages from scratch, there are a few different ways. First of all, you can wrap a value like this: “Message wrap(42)”. That will return a new message that wraps the number 42.

You can create a message chain from a piece of text by doing ‘Message fromText(“one two three”)’. This will return a message chain with three messages, linked together.

Finally, you can create a new message chain by using the from-method. You use it like this: “Message from(one two(three) four)”. What is returned is the message chain that is the argument. If you think about it for a few seconds, you can probably guess how to implement this using an Ioke macro.

Quoting

Now that we understand messages and message chains, let us take a look at how to create new chains in a flexible way.

First of all, all of the above methods are all very useful and nice, but they tend to be a bit verbose. Coming from a Lisp background I felt inclined to put the quoting characters to good use for this. So, first of all, the single quote (‘) does the same thing as “Message from”. The back quote (`) does the same thing as “Message wrap”. So, to wrap the number 42, you can just do `42. In this case you don’t need parenthesis, since the back quote is an operator. To create a new message chain, use the single quote: ‘(foo bar(x) baz).

We almost have everything we need, except that we need some convenient ways of actually putting things into these message chains without having to put them together by hand.

Say for example we have a variable “blah” that contains an unknown message. We want to create a message “one” that is followed by the message in the variable “blah”. And then finally we want to add two messages “bax” and “baz” after it. We could do it like this: x = ‘one. x -> blah. x last -> ‘(bax baz). All in all, that is not too bad, but we can do better. This is done using the splice-quote operator, which is just two single quotes after each other. Using that it would look like this: ”(one `blah bax baz). In this case, the back quote inside of the splice-quote call will actually be evaluated in the current context and then have the result be spliced into the message chain being created. Now, only use the back quote if you are sure you can modify it. If you want to copy blah before inserting it, use the single quote again, instead of the back quote: ”(one ‘blah bax baz)

All in all, this is really all you need, and you can take a look at the core libraries and see how they are used. A typical example is the comprehensions library, and also the destructuring macros. In general, creating these message chains on the fly is the most useful inside of syntax macros.

I am planning to add a new feature to Ioke, that allow you to do tree rewriting for manipulating chains in different ways. This will be a feature built on top of the primitives described here, and these features will continue to be the main way of working with message chains for a long time.

No Comments | By Ola Bini | In: ioke | tags: ioke, macros, message chains, programming language design, syntax. | #

January 8th, 2009

Macro types in Ioke – or: what is a dmacro?

With the release of Ioke 0, things regarding types of code were pretty simple. At that point Ioke had DefaultMethod, LexicalBlock and DefaultMacro. (That’s not counting the raw message chains of course). But since then I’ve seen fit to add several new types of macros to Ioke. All of these have their reason for existing, and I thought I would try to explain those reasons a bit here.

But first I need to explain what DefaultMacro is. Generally speaking, when you send the message “macro” in Ioke, you will get back an instance of DefaultMacro. A DefaultMacro is executed at runtime, just like regular methods, and in the same namespace. So a macro has a receiver, just as a method. In fact, the main difference between macros and methods are that you can’t define arguments for a macro. And when a message activates a macro, the arguments sent to that message will not be evaluated. Instead, the macro gets access to a cell called “call”. This cell is a mimic of the kind Call.

What can you do with a Call then? Well, you can get access to the unevaluated arguments. The easiest way to do this is by doing “call arguments”. That returns a list of messages. A Call also contains the message sent to activate it. This can be accessed with “call message”. Call contains a reference to the ground in which the message was sent. This is accessed with “call ground”, and is necessary to be able to evaluate arguments correctly. Finally, there are some convenience methods that allow the macro to evaluate arguments. Doing “call argAt(2)” will evaluate the third argument and return it. This is a short form for the equivalent “call arguments[2] evaluateOn(call ground, call ground)”.

This is all well and good. Macros allow you to do most things you would want to do, really. But they are quite rough to work with in their raw form. There are also plumbing that is a bit inconvenient. One common thing that you might want to do is to transform the argument messages without evaluating them, return those messages and have them be inserted instead of the current macro. You can do this directly, but it is as mentioned above a bit inconvenient. So I added DefaultSyntax. You define a DefaultSyntax with a message called “syntax”. The first time a syntax is activated, it will run, take the result of itself and replace itself with that result, and then execute that result. The next time that piece of code is found, the syntax will not execute, instead the result of the first invocation will be there. This is the feature that lies behind for comprehensions. To make this a bit more concrete, lets create a very simplified version of it. This version is fixed to take three arguments, an argument name, an enumerable to iterate over, and an expression for how to map the output value. Basically, a different way of calling “map”. A case like this is good, because we have all the information necessary to transform it, instead of evaluating it directly.

An example use case could look like this:

myfor(x, 1..5, x*2) ; returns [2,4,6,8,10]

Here myfor will return the code to double the the elements in the range, and then execute that.

The syntax definition to make this possible looks like this:

myfor = syntax(
  "takes a name, an enumerable, and a transforming expression
and returns the result of transforming each entry in the
expression, with the current value of the enumerable
bound to the name given as the first argument",

  argName = call arguments[0]
  enumerable = call arguments[1]
  argCode = call arguments[2]
  ''(`enumerable map(`argName, `argCode))
)

As you can see, I’ve provided a documentation text. This is available at runtime.

Syntactic macros also have access to “call”, just like regular macros. Here we use it to assign three variables. These variables get the messages, not the result of those things. Finally, a metaquote is used. A metaquote takes its content and returns the message chain inside of it, except that anywhere a ` is encountered, the message at that point will be evaluated and spliced into the message chain at that point. The result will be to transform “myfor(x, 1..5, x*2)” into “1..5 map(x, x*2)”.

As might be visible, the handling of arguments is kinda impractical here. There are two problems with it, really. First, it’s really verbose. Second, it doesn’t check for too many or too few arguments. Doing these things would complicate the code, at the expense of readability. And regular macros have exactly the same problem. That’s why I implemented the d-family of destructuring macros. The current versions of this are dmacro, dsyntax, dlecro and dlecrox. They all work the same, except the generate macros, syntax, lecros or lecroxes, depending on which version used.

Let’s take the previous example and show how it would look like with dsyntax:

myfor = dsyntax(
  "takes a name, an enumerable, and a transforming expression
and returns the result of transforming each entry in the
expression, with the current value of the enumerable
bound to the name given as the first argument",

  [argName, enumerable, argCode]

  ''(`enumerable map(`argName, `argCode))
)

The only difference here is that we use dsyntax instead of syntax. The usage of “call arguments[n]” is gone, and is instead replaced with a list of names. Under the covers, dsyntax will make sure the right number of arguments are sent and an error message provided otherwise. After it has ensured the right number of arguments, it will also assign the names in the list to their corresponding argument. This process is highly flexible and you can choose to evaluate some messages and some not. You can also collect messages into a list of messages.

But the real nice thing with dsyntax is that it allows several choices of argument lists. Say we wanted to provide the option of giving either 3 or 4 arguments, where the expansion looks the same for 3 arguments, but if 4 arguments are provided, the third one will be interpreted as a condition. In other words, to be able to do this:

myfor(x, 1..5, x*2) ; returns [2,4,6,8,10]
myfor(x, 1..5, x<4, x*2) ; returns [2,4,6]

Here a condition is used in the comprehension to filter out some elements. Just as with the original, this code transforms into an obvious application of “filter” followed by “map”. The updated version of the syntax looks like this:

myfor = dsyntax(
  "takes a name, an enumerable, and a transforming expression
and returns the result of transforming each entry in the
expression, with the current value of the enumerable
bound to the name given as the first argument",

  [argName, enumerable, argCode]

  ''(`enumerable map(`argName, `argCode)),

  [argName, enumerable, condition, argCode]

  ''(`enumerable filter(`argName, `condition) map(`argName, `argCode))
)

The only thing added is a new destructuring pattern that matches the new case and in that situation returns code that includes a call to filter.

The destructuring macros have more features than these, but this is the skinny on why they are useful. In fact, I’ve used a combination of syntax and dmacro to remove a lot of repetition from the Enumerable core methods, for example. Things like this make it possible to provide abstractions where you only need to specify what’s necessary, and nothing more.

And remember, the destructuring I’ve shown with dsyntax can be done exactly the same for macros and lecros. Regular methods doesn’t need it that much, since the rules for DefaultMethod arguments are so flexible anyway. But for macros this has really made a large difference.

7 Comments | By Ola Bini | In: ioke | tags: ioke, macros, programming language design, syntax. | #

September 23rd, 2006

Three ways to add Ruby Macros

As most of my readers probably have realized at this point, I have a few obsessions. Lisp and Ruby happens to be two of the more prominent ones. And regarding Lisp, macros is what especially interest me. I have been doing much thinking lately on how you could go about adding some kind of macro facility to Ruby and these three options are the result.

I should begin by saying that none of these options are entirely practical right now. All of them have some serious problems which I frankly haven’t been able to come up with an answer for yet. But that doesn’t stop me from blogging about my ideas, of course. Another thing to notice is that this is not about hygienic macros. This is the full-blown, power, blow-the-moon away version of macros.

MacRuby – Direct defmacro in Ruby
The first approach rests on modifying the language itself. You can add a defmacro keyword which takes a name and a code block to execute. Each time the compiler/interpreter finds a macro-definition, it will remember the name. When that name is found in the code later on each place will be marked. Then, before execution begins, all places where the call to the macro are will be replaced by the output from sending in the subnodes at that place by the output of calling the macro. An example of a simple macro:

 defmacro log logger, level, *messages
if $DEBUG
  :call, logger, level, *messages
else
  :nop
end
end

log @l, :debug, "value is: #{very_expensive_operation()}"

What’s interesting in this case is that the messages will not be evaluated if the $DEBUG flag is not set. This is because the value returned from the macro will be spliced into the AST only if that flag is set. Otherwise a no-op will be inserted instead. Obviously, for this kind of code to work, the interpreter would need to change substantially. There is also a big problem with it, since it’s very hard to fit this model into the object-oriented system of Ruby. As I think about it now, it seems macros would be the only non-OOP feature in Ruby, if added in this way. Another big problem with this model is that it is really not that intuitive what the resulting code from the macro will be. As soon as something more advanced needs to be returned, it will be very hard getting it straight in your head. One solution to this would be to do it the standard CL way. First write the output from the macro in several different instances. Then transform this to the AST code through a tool that parses the code. Then transform this into the macro. This process would be helped by tools, of course.

Back-and-Lisp-Ruby – Write macros in Lisp, translate Ruby back and forth
Another way to achieve this power in Ruby would be to separate the macro language from the main language. In effect, the macros would be a classic pre-processor. To offer the same power level as Lisp and others, the best way would be to write the macros themselves in a Lisp dialect, then transform Ruby in a well-defined way to Lisp and back again. (See the next version for more about this idea.) In this situation the same macro as before could look like this:

 (defmacro log (logger level &rest messages)
 (if $DEBUG
     `(,level ,logger ,@messages)
     '()))

The main difference in this code is that the macro and the output from the macro is Lisp. We have gotten rid of the ugly :call and :nop return values, and to me this seems quite readable. Of course, I’m not sure everyone else feels the same way. And we still have the same problem with Object Orientedness. It’s missing.

RoCL – Ruby over Common Lisp
The final idea is to build a Ruby runtime within Common Lisp and transform Ruby into Common Lisp before running it. The macros could either be added as Ruby code or Lisp code. Everything will be transformed into the equivalent code in Lisp, maybe using CLOS as the Object-system, or building something based on Ruby’s. Of course, the semantics of many things would change, and many libraries would need to rewritten. But in the end, there would be incredible power available. Especially if we can make it go both ways, so that Common Lisp can use Ruby libraries.

An example transformation could look like this. From this Ruby:

 class String
  def revert(a, *args)
    if block_given?
      yield a
    else
      args + [a]
    end
  end
end

"abc".revert "one" do |x|
  puts x
end

This is nonsense code, if you hadn’t noticed. =)

 (with-class "String" nil
            (def revert (a block &rest args)
              (if block
                  (apply block a)
                  (+ args [a]))))
(revert "abc" "one" #'(lambda (x)
                        (puts self x)))

Conclusions
It is very hard to actually retrofit macros into Ruby after the fact. I’m still not sure it can be done and keep enough of Ruby’s semantics to make it meaningful. It seems that we need a new language. But if I had to choose among these approach, the RoCL one seems the most interesting and also the most fun to implement. If I have a motto it would have to be something in the line of “best of all worlds”. I want the best from Ruby, Java, Lisp, Erlang and everything I can find.

17 Comments | By Ola Bini | In: Uncategorized | tags: lisp, macros, ruby. | #