Ioke syntax


Or: How using white space for application changes the syntax of a language.

I have spent most of the weekend working with different syntax elements of Ioke. Several of them are actually based on one simple decision I made quite early, and I thought it would be interesting to take a look at some of the syntax elements I’ve worked on, from the angle of how they are based on that one decision.

What is this decision then? In the manner of Smalltalk, Self and Io, I decided that periods are not the way to apply methods. Instead, space makes sense for this. So if in Java you would write “foo().bar(1).quux(2,3)” this would be written as “foo bar(1) quux(2, 3)” in Ioke. Everything is an expression and sending a message to something is done with putting the message adjacent to the thing receiving the message, separated by whitespace. This turns out to have some consequences I really didn’t expect, and several parts of the syntax have actually changed a lot because of this decision. I’ll take a look at the things that changed most recently because of it.

Terminators

Most language without explicit expression nesting (like Lisp) need some way to decide when a chain of message passing should stop. Most scripting languages today try to use newlines, and then use semicolons when newlines doesn’t quite work. That’s what I started out doing with Ioke too (since Io does it). But once I started thinking about it, I realized that Smalltalk got this thing right too. Since I don’t use dots for message application, I’m free to use it for termination. You still don’t need to terminate things that are obviously terminated with newlines, but when you need a terminator, the dot reads very well. I’ve always disliked the intrusiveness of semicolons – they seem to take to much visual space for me. Dots feel like the right size, and there is also a more pleasing symmetry with commas.

Comments

Once you don’t use semicolons for termination, you can use it for other things. I am quite fond of the Lisp tradition of using semicolons for comments, so I decided to not use hashes for that anymore. One of the ways Lisp systems use semicolons for comments is that they use different numbers of them to prepend different kinds of documentation. Common Lisp standard is to use four semicolons for headlines, three semicolons for left justified comments, two semicolons for a new line of comment that should be indented with the program text, and one semicolon for comments on the same line as program text. These things work because semicolons doesn’t take up so much visual space when stacked. A hash would never work for it.

The obvious question from any person with Unix experience will be how I handle shebangs if a hash isn’t a comment anymore. The short answer is that I will provide general read macro syntax based on hash. Since the shebang always starts with “#!” that would be a perfect application for a reader macro. That also opens up the possibility for other interesting reader macros, but I’ll take that question later.

Operator precedence

This one was totally unexpected. I had planned to add regular operator precedence style and it ended up being quite painful. I should probably have guessed the problem, but I didn’t – two grammar files later and I’m now hopefully a bit wiser. The problem ended up being whitespace. Since I use whitespace to separate application, but whitespace is also interesting to judge operator precedence, what happened was that the parsers I got working actually had exponential amount of backtracking. Two lines of regular code without operators still backtracked enough to take a minute or two to parse. Ouch. So what’s the solution? Two passes of parsing. Or not exactly, but almost. I’m currently implementing something like Io’s operator shuffling, which is a general solution to rearrange operators into a canonical form based on precedence rules. What’s fun with it is that the rules can be dynamically changed. If you want Smalltalk style left to right precedence, that should be possible by just setting the precedence to 1 for all operators. You can also turn of operator shuffling completely, which means you can’t use infix operators at all.

I’m also planning a way to scope these things, so you can actually change quite a lot of the syntax without switching the parser.

At some point I’m planning to explore how it would work to use an Antlr tree parser to do the shuffling. My intuition is that it would work well, but I’ll have to find the time to do it.

Syntactic flexibility

All is not perfect, but the current scheme seems to work well. I’ve been able to get a real amount of flexibility into the syntax, with loads of operators free for anyone to use and subclass. The result will be the possibility to create internal DSLs that Ruby could only dream of. Some things gets harder too, though. Regular expression syntax for example. If you can create a statement like this: “[10,12,14] map(/2 * 2/a)”, it’s kinda obvious that there is no easy way to know whether the statement inside the mapping call is a regular expression or an expression fragment. In Ioke the decision is simple, the above is an expression fragment. I’ve decided to make it really easy to work with regular expression syntax. Interestingly, it was one of the reasons I wanted reader macros for, and it turns out that using #/ will work well. So a regular expression looks just like in a perl like language, except that you add a hash before the first slash: #/foo/ =~ “str”. It seems that hash will end up being my syntax sin bin for those cases where I want syntax without touching the parser to much.

It’s funny to see how many things in classic syntax that changes if you change how message passing works. I like Ioke more and more for each of these things I find, and it currently looks very pleasant to work with. Dots are such an improvement for one-lines.


7 Comments, Comment or Ping

  1. mala

    Offtopic:
    I read you potentially want continuations and I read somewhere that you have an ‘ensure’ that works like Java’s finally. I can’t imagine how those two things work together. Do you have any ideas how that might work?

    November 3rd, 2008

  2. Have you thought about named parameters/messages also like smalltalk? So taking your example further: “foo bar: 1” or “quux: 2 with: 3”

    I’ve always wondered why no other language takes the named parameter approach similar to smalltalk, which then makes for some very readable APIs.

    This does bring up the question of statement scope, which Smalltalk deals with using braces, so “(foo bar: 1) quux: 2 with: 3” would send the quux:with: message to the result of foo’s bar:.

    November 4th, 2008

  3. Mala:
    I’m undecided about continuations. It might be fun to have them, but on the other hand it will not play well for ensure-style mechanisms. I might have to have something like a “wind” and “unwind” pair, the way that’s been proposed for Scheme.

    Mark:
    Yeah, I did think about it, but in the end decided against it. All things you note are very true – they make for very readable APIs, but the statement scopes come in question quite quickly. I find that Smalltalk code sometimes gets hard to read for this reason.

    There is one “big” language that uses named parameters: Objective C.

    Anyway, Ioke has keyword arguments by default, and you can define your own method-type that works with Smalltalk style selectors if you want. The syntax is flexible enough for that.

    November 4th, 2008

  4. Jurgen

    What you like is obviously influenced by what you have been using. I remember (coming from Java) that I really didn’t like the “def” in Ruby, but now it’s natural. So I would argue that if at all possible, one should attempt to stick to conventions which everybody has been brought up with. Dots at the end of a sentence is what we’re used to. So that fits very well.
    But I don’t know of any human language that starts phrases with a semi-colon? As a non-Lisper, they’ve always looked “ugly” to me, but again, I realize this is personal.
    Thought about using tags (like HTML) for comments?

    November 6th, 2008

  5. renoX

    This is all a matter of personnal preference but
    1) I disagree with the idea of using ‘;’ instead of ‘#’: a clear differenciation between comments and code is a good idea.

    2) while I think that Smalltalk went too far and the lack of parenthesis to separate method name with parameters reduce readability, I think that named parameter is a good idea, that too few language use.

    November 11th, 2008

  6. Interesting. I like your syntax choices, in particular the use of white space, but I must admit I’d prefer # for comments, mainly because it would be more familiar, generally speaking.

    I can’t wait to try it out!

    November 11th, 2008

  7. Nikolay Petrov

    The thing that bugs me – how exactly do you picture operators. I mean – are they going to be first class values? Or this is just syntactic sugar for a concrete methods? Do you allow arbitrary operators? I mean the whole story.

    I’m asking because I’ve tried a number of times designing a language, but the operators always hit me hard. For instance as far as I know we have the following styles of operator definitions:
    – Operators are defined using strict rules of precedence working over a fixed number of language entries (ala Java)
    – There are no operators (ala Lisp because of prefix syntax)
    – Operators are defined using strict rules of precedence, delegating to concrete methods in the objects (Python)
    – Operators does not have any precedence – left to right, delegating to methods (Smalltalk)
    – Operators have user defined precedence and are real methods/functions (Haskell)
    Its possible to have other options too, but I can’t come up with one now.

    The problem starts if the language designer wants the users of the language to be able to come up with new operators. One option I see is if you adopt the style that the program executes line by line in source file, you can come up with syntactic structure with does the thing. For instance:
    infixl 30
    In this case the file could start with a bunch of requires then a number of operator definitions. But this should also be able to be abstracted away. So requires also import operators which are defined in these modules. But this could potentially bring name problems.

    November 18th, 2008

Reply to “Ioke syntax”