An Ioke spelling corrector


A while back Peter Norvig (of AI, Lisp and Google fame) published a small entry on how spelling correctors work. He included some code in Python to illustrate the concept, and this code have ended up being a very often used example of a programming language.

It would be ridiculous to suggest that I generally like to follow tradition, but in this case I think I will. This post will take a look at the version implemented in Ioke, and use that to highlight some of the interesting aspects of Ioke. The code itself is quite simple, and doesn’t use any of the object oriented features of Ioke. Neither does it use macros, although some of the features used is based on macros, so I will get to explain a bit of that.

For those that haven’t seen my Ioke posts before, you can find out more at http://ioke.org. The features used in this example is not yet released, so to follow along you’ll have to download and build yourself. Ioke S should be out in about 1-2 weeks though, and at that point this blog post will describe released software.

First, for reference, here is Norvig’s original corrector. You can also find his corpora there: http://norvig.com/spell-correct.html. Go read it! It’s a very good article.

This code is available in the Ioke repository in examples/spelling/spelling.ik. I’m adding a few line breaks here to make it fit the blog width – expect for that everything is the same.

Lets begin with the method “words”:

words = method(text,
  #/[a-z]+/ allMatches(text lower))

This method takes a text argument, then calls the method “lower” on it. This method will return a new text that is the original text converted to lower case. A regular expression that matches one or more alphabetical characters are used, and the method allMatches called on it. This method will return a list of texts of all the places matches in the text.

The next method is called “train”:

train = method(features,
  features fold({} withDefault(1), model, f,
    model[f] ++
    model))

The argument “features” should be a list of texts. It then calls fold on this list (you might know fold as reduce or inject. those would have been fine too.) The first argument to fold is the start value. This should be a dict, with a default value of 1. The second argument is the name that will be used to refer to the sum, and the third argument is the name to use for the feature. Finally, the last argument is the actual code to execute. This code just uses the feature (which is a text), indexes into the dict and increments the number there. It then returns the dict, since that will be the model argument the next iteration.

The next piece of code uses the former methods:

NWORDS = train(words(FileSystem readFully("small.txt")))

alphabet = "abcdefghijklmnopqrstuvwxyz" chars

As you can see we define a variable called NWORDS, that contains the result of first reading a text file, then extracting all the words from that text, and finally using that to train on. The next assignment gets a list of all the characters (as text) by calling “chars” on a text. I could have just written [“a”, “b”, “c”, …] etc, but I’m a bit lazy.

OK, now we come to the meat of the code. For an explanation of why this code does what it does, refer to Norvig’s article:

edits1 = method(word,
  s = for(i <- 0..(word length + 1),
    [word[0...i], word[i..-1]])

  set(
    *for(ab <- s,
      ab[0] + ab[1][1..-1]), ;deletes
    *for(ab <- s[0..-2],
      ab[0] + ab[1][1..1] + ab[1][0..0] + ab[1][2..-1]), ;transposes
    *for(ab <- s, c <- alphabet,
      ab[0] + c + ab[1][1..-1]), ;replaces
    *for(ab <- s, c <- alphabet,
      ab[0] + c + ab[1]))) ;inserts

The mechanics of it is this. We create a method assigned to the name edits1. This method takes one argument called “word”. We then create a local variable called “s”. This contains the result of executing a for comprehension. There are several things going on here. The first part of the comprehension gives a generator (that’s the part with the <-). The thing on the right is what to iterate over, and the thing on the left is the name to give it on each iteration. Basically, this comprehensions goes from 0 to the length of the word plus 1. (The dots denote an inclusive Range). The second argument to “for” is what to actually return. In this case we create a new array with two elements. The three dots creates an exclusive Range. Ending a Range in -1 means that it will extract the text to the end of it.

The rest of the code in this method is four different comprehensions. The result of these comprehensions are splatted, or spread out as arguments to the “set” method. The * is symbol to splat things. Basically, it means that instead of four lists, set will get all the elements of all the lists as separate arguments. Finally, set will create a set from these arguments and return that.

Whew. That was a mouthful. The next method is easy in comparison. More of the same, really:

knownEdits2 = method(word,
  for:set(e1 <- edits1(word),
    e2 <- edits1(e1),
    NWORDS key?(e2),
    e2))

Here we use another variation of a comprehension, namely a set comprehension. A regular comprehension returns a list. A set comprehension returns a set instead. This comprehension will only return words that are available as keys in NWORDS.

known = method(words,
  for:set(w <- words,
    NWORDS key?(w), w))

This method uses a set comprehension to find all words in “words” that are keys in NWORDS. As this point you might wonder what a comprehension actually is. And it’s quite easy. Basically, a comprehension is a piece of nice syntax around a combination of calls to “filter”, “map” and “flatMap”. In the case of a set comprehension, the calls go to “filter”, “map:set” and “flatMap:set” instead. The whole implementation of comprehensions are available in Ioke, in the file called src/builtin/F10_comprehensions.ik. Beware though, it uses some fairly advanced macro magic.

OK, back to spelling. Let’s look at the last method:

correct = method(word,
  candidates = known([word]) ifEmpty(
    known(edits1(word)) ifEmpty(
      knownEdits2(word) ifEmpty(
        [word])))
  candidates max(x, NWORDS[x]))

The correct method takes a word to correct and returns the best possible match. It first tries to see if the word is already known. If the result of that is empty, it tries to see if any edits of the word is known, and if that doesn’t work, if any edits of edits are known. Finally it just returns the original word. If more than one candidate spelling is known, the max method is used to determine which one was featured the most on the corpus.

The ifEmpty macro is a bit interesting. What it does is pretty simple, and you could have written it yourself. But it just happens to be part of the core library. The implementation looks like this:

List ifEmpty = dmacro(
  "if this list is empty, returns the result of evaluating the argument, otherwise returns the list",

  [then]
  if(empty?,
    call argAt(0),
    self))

This is a dmacro, which basically just means that the handling of arguments are taken care of. The argument for a destructured macro can be seen in the square brackets. Using a dmacro instead of just a raw macro means that we will get a good error message if the wrong number of arguments are provided. The implementation checks if its list is empty. If it is it returns the value of the first argument, otherwise it doesn’t evaluate anything and returns itself.

So, you have now seen a 16 line spelling corrector in Ioke (it’s 16 without the extra line breaks I added for the blog).



Ioke 0 released


I am very happy to announce the first release of Ioke!

Ioke is a dynamic language targeted at the Java Virtual Machine. It’s been designed from scratch to be a highly flexible general purpose language. It is a prototype-based programming language that is inspired by Io, Smalltalk, Lisp and Ruby.

Homepage: http://ioke.org
Download: http://ioke.org/download.html
Programming guide: http://ioke.org/guide.html

Ioke 0 is the first release of Ioke, and as such is not production ready. I would appreciate if people tried it out and reported bugs, thoughts and ideas.

Features:
– Strong, dynamic typing
– Prototype based object orientation
– Homoiconic language
– Simple syntax
– Powerful macro facilities
– Condition system
– Developed using TDD
– Documentation system that combines documentation with specs
– Wedded to the JVM

Go ahead! Try it out!



Ioke 0 roadmap


The first release of Ioke will be called Ioke 0, and I aim to have it more or less finished in a month or so. At the longest, it might take until Christmas. So, since it’s coming soon, I thought I would just put in a list of the kind of things I’m aiming to have in it at that release. I’ll also quickly discuss some feature I will have in the language but that’s going to be on Ioke I or Ioke II.

First, the first release of the language means that the basic core is there. The message passing works and you can create new things, methods and blocks. Numbers are in, but nothing with decimal points so far. If I need it for some of the other stuff I’m implementing, I’ll add them, otherwise integers might be the only numbers in Ioke 0. I’m OK with that. The core library will be quite small at this point too. Ioke 0 will be a usable language, but it’s definitely not batteries included in any way.

These are some specific things I want to implement before releasing it:

  • List and Dict should be in, including literal syntax for creation, aref-fing and aset-ting. Having syntax for aset means that I will have in place a simple version of setting of places, instead of just names.
  • Enumerable-like implementation for List and Dict.
  • DefaultMethod and LexicalBlock should support regular, optional, keyword and rest arguments. Currently only the rest arguments are missing, and this is mostly because I don’t have Lists yet.
  • Basic support for working with message instances, to provide crude metaprogramming.
  • The full condition system. That includes modifying the implementation to provide good restarts in the core. It also might include a crude debugger. Restarts are implemented, but the conditions will take some time.
  • cellMissing should be there. Contexts should be implemented in terms of it.
  • Basic IO functionality.
  • A reader (that reads Ioke syntax and returns the generated Message tree).
  • Access to program arguments.
  • IIk – Interactive Ioke. The REPL should definitely be in, and be tightly integrated with the main-program. I’m taking the Lisp route here, not the Ruby one. IIk will be implemented in Ioke, and should drive the evolution of several of the above features.
  • Dokgen – A tool to generate documentation about existing cells in the system. Since this information is available at run time it should be exceedingly easy to create this tool. Having it will drive features too.
  • Affirm – A testing framework written in Ioke. The goal will be to rewrite the full test suite of Ioke (which is currently using JtestR) into using Affirm instead. That’s going to happen between Ioke 0 and Ioke I.
  • Documentation that covers the full language, and some usage pointers.

There are some features I’m not sure about yet. They are larger and might prove to be too large to rush out. The main one of these is the Java integration features. Right now I’m thinking about waiting with that support.

I have loads of features planned for the future. These are the ones that I’m most interested in getting in there quite soon, which means they’ll be in either I or II.

  • Java Integration
  • Full ‘become’, with the twist that become will actually not change the class of an instance, but instead change an instance into the other instance. This is something I’ve always wanted in Ruby, and ‘become’ seems to be a fitting way to do it. This will make transparent futures and things like that quite easy to implement.
  • Common Lisp like format, that can handle formatting of elements in a List in the formatting language. Not sure I’m going to use the same syntax as Common Lisp, though. Maybe I’ll just make it into an extension of the printf support?
  • Simple aspects. Namely, it should be possible to add before, after and around advice to any cell in the system. I haven’t decided if I should restrict this to only activatable cells or any cell at all.
  • Ranges.
  • Macros. I’m not sure which version I’ll end up with yet. I have two ideas that might be more or less the same, but both of them are really, really powerful.
  • Simple methods. In Ioke, a method is something that follows a very simple interface. It’s extremely easy to create something that acts like a method in some cases but does something different. Simple methods are restricted in the kind of meta programming they can do, which means they can be compiled down to quite efficient code. This is a bit further away, maybe III or IV.
  • Continuations. I would like to have them. I think I can do it without changing to much of the structure. This is not at all a certainty at the moment, but it might happen.

That’s about it for now. Once I have the core language in place I want to start working on useful libraries around it. Once 0 is out, I’m planning to start using Ioke as my main scripting language, and have that drive what libraries I need to create and so on.

Around II or III, I think it’s time to go metacircular. Not necessarily for the implementation, but to describe the semantics in it. Might be possible to do something like SLang too, and compile Ioke to Java for the needed core.

If you are interested in following the development, you can check it out at my git repository at http://github.com/olabini/ioke, or at the project pages at http://ioke.kenai.com. The Git repository is the canonical one right now, and the Kenai HG one is a clone of that. If you’re interested in discussion Ioke, there are mailing lists at the project pages. I also will have a real page for the project ready for the first release. But I promise you will notice when that release happens.



The Maintenance myth


Update: I’ve used the words “static” and “dynamic” a bit loose with regards to languages and typing in this post. If this is something that upsets you, feel free to read “static” as “Java-like” and “dynamic” as “Ruby-like” in this post. And yes, I know that this is not entirely correct, but just as mangling the language to remove all gender bias makes it highly inconvenient to write, I find it easier to write in this language when the post is aimed at people in these camps.

Being a language geek, I tend to get into lots of discussions about the differences between languages, what’s good and what’s bad. And being a Ruby guy that hangs out in Java crowds, I end up having the static-vs-dynamic conversation way too often. And it’s interesting, the number one question everyone from the static “camp” has, the one thing that worries them the most is maintenance.

The question is basically – not having types at compile time, won’t it be really hard to maintain your system when it grows to a few millions of lines of code? Don’t you need the static type hierarchy to organize your project? Don’t you need an IDE that can use the static information to give you intellisense? All of these questions, and many more, boil down to the same basic idea: that dynamic languages aren’t as maintainable as static ones.

And what’s even more curious, in these kind of discussions I find people in the dynamic camp generally agrees, that yes, maintenance can be a problem. I’ve found myself doing the same thing, because it’s such a well established fact that maintenance suffers in a dynamic system. Or wait… Is it that well established?

I’ve asked some people about this lately, and most of the answers invariably beings “but obviously it’s harder to maintain a dynamic system”. Things that are “obvious” like that really worries me.

Now, Java systems can be hard to maintain. We know that. There are lots of documentation and talk about hard to maintain systems with millions of lines of code. But I really can’t come up with anything I’ve read about people in dynamic languages talking about what a maintenance nightmare their projects are. I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages). And they are not talking about how they wish they had static typing. Not at all. Of course, this is totally anecdotal, and maybe these guys are above your average developer. But in that case, shouldn’t we hear these rumblings from all those Java developers who switched to Ruby? I haven’t heard anyone say they wish they had static typing in Ruby. And not all of those who migrated could have been better than average.

So where does that leave us? With a big “I don’t know”. Thinking about this issue some more, I came up with two examples where I’ve heard about someone leaving a dynamic language because of issues like this. And I’m not sure how closely tied they are to maintenance problem, not really, but these were the only ones I came up with. Reddit and CDBaby. Reddit switched from Lisp to Python, and CDBaby switched from Ruby to PHP. Funny, they switched away from a dynamic language – but not to a static language. Instead they switched to another dynamic language, so the problem was probably not something static typing would have solved (at least not in the eyes of the teams responsible for these switches, at least).

I’m not saying I know this is true, because I have no real, hard evidence one way or another, but to me the “obvious” claim that dynamic languages are harder to maintain smells a bit fishy. I’m going to work under the hypothesis that this claim is mostly myth. And if it’s not a myth, it’s still a red herring – it takes the focus away from more important concerns with regard to the difference between static and dynamic typing.

I did a quick round of shouted questions to some of my colleagues at ThoughtWorks I know and respect – and who was online on IM at the mime. The general message was that it depends on the team. The people writing the code, and how they are writing it, is much more important than static or dynamic typing. If you make the assumption that the team is good and the code is treated well from day 0, static or dynamic typing doesn’t make difference for maintainability.

Rebecca Parsons, our CTO said this:

I think right now the tooling is still better in static languages. I think the code is shorter generally speaking in dynamic languages which makes it easier to support.

I think maintenance is improved when the cognitive distance between the language and the app is reduced, which is often easier in dynamic languages.

In the end, I’m just worried that everyone seems to take the maintainability story as fact. Has there been any research done in this area? Smalltalk and Lisp has been around forever, there should be something out there about how good or bad maintenance of these systems have been. There are three reasons I haven’t seen it:

  • It’s out there, but I haven’t looked in the right places.
  • There are maintenance problems in all of these languages, but people using dynamic languages aren’t as keen on whining as Java developers.
  • There are no real maintenance problems with dynamic languages.

There is a distinct possibility I’ll get lots of anecdotal evidence in the comments on this post. I would definitely prefer fact, if there is any to get.



Ioke runs iterative fibonacci


Today Ioke actually runs both recursive and iterative fibonacci. That might not seem as much, but the work to get that has put in place much of the framework needed for the rest of the implementation.

It’s a nice milestone, since Ioke is now Turing complete (having both conditionals and iteration). Most of the neat features I’m planning aren’t actually implemented yet, though.

In the time honored tradition of language performance measuring, I decided to compare iterative fibonacci performance to Ruby.

Keep in mind that I haven’t done any optimizations whatsoever, and I do loads of really expensive stuff all over Ioke. Specifically, there is no such thing as locals – what looks like locals here are actually regular attributes of a Context object. All lookup of names are using hash tables at the moment. It’s also fully interpreted code. Nothing is being compiled at this point. I’m running all the examples on SoyLatte (Java 1.6) on a MBP. I used the JVM -server flag when running Ioke and JRuby.

The Ruby code looks like this:

require 'benchmark'

def fib_iter_ruby(n)
   i = 0
   j = 1
   cur = 1
   while cur <= n
     k = i
     i = j
     j = k + j
     cur = cur + 1
   end
   i
end

puts Benchmark.measure { fib_iter_ruby(300000) }
puts Benchmark.measure { fib_iter_ruby(300000) }

And the Ioke code looks like this. I don’t have any benchmarking libraries yet, so I measured it using time:

fib = method(n,
  i = 0
  j = 1
  cur = 1
  while(cur <= n,
    k = i
    i = j
    j = k + j
    cur++)
  i)

System ifMain(fib(300000))

And what are the results? Not surprisingly, JRuby does well on this benchmark, and would probably do even better if I ran more iterations. The JRuby (this is current trunk, btw) time for calculating fib(300000) was 7.5s. MRI (ruby 1.8.6 (2008-03-03 patchlevel 114) [i686-darwin8.10.1]) ended up at exactly 14s. So where is Ioke in all this? I’m happy to say Ioke ended up taking 9.2s. I was really pleasantly surprised by that. But I have a feeling that recursive fib might not end up with those proportions. But the indication is that I haven’t done anything amazingly expensive yet, at least. That’s a good sign, although I have no problem sacrificing performance for expressability.



Why not Io?


I have been asked a few times in different circumstances why I feel the need to create my own language instead of just working with Io. That is a very valid question, so I’m going to try to answer it here.

First of all, I like Io a lot. Ioke is close enough to Io that it will be obvious who the parent is. In my mind at least, the differences are in many ways cosmetic and in those that are not it’s because I have some fairly specific things in mind.

So what are the main differences? Well, first of all it runs on the JVM. I want it that way because of all the obvious reasons. The Java platform is just such a good place to be. All the libraries are there, a good way of writing extensions in a language that is not C/C++, a substrate that gives me threads, GC and Unicode for free. So these reasons make a big difference both for people using Ioke, and for me. I want to be able to use Ioke to interact with other languages, polyglot programming and all. And since I expect Ioke to be much more expressive than most other languages, I think it will be a very good choice to put on top of a stable layer in the JVM. Being implemented in C makes these benefits go away.

Of course I could just have ported Io to the JVM and be done with it. That’s how it all started. But then I realized that if I decided to not do a straight port, I could change some things. You have seen some discussions about the decisions I’m revisiting here. The whole naming issue, handling of numbers, etc. Other things are more core. I want to allow as much syntactic flexibility as possible. I really can’t stand the three different assignment operators. I know they make the implementation easier to handle, but having one assignment operator with pluggable semantics gives a more expressive language.

Another thing I’m adding in is literal syntax for arrays and hashes, and literal syntax for referencing and setting elements in these. Literals make such a difference in a language and I can’t really handle being without it. These additions substantially complicate the language, but I think it’s worth it for the expressive power.

A large difference in Ioke will be the way AST modification will be handled. Io gives loads of power to the user with regard to this, but I think there is more that can be done. I’m adding macros to Ioke. These will be quite powerful. As an example, the DefaultMethod facility (that gives arguments, optional arguments, REAL keyword arguments and rest argument) can actually be implemented in Ioke itself, using macros. At the moment this code is in Java, but that’s only because of the bootstrapping needed. The word macro might be a bad choice here though, since it executes at the same time as a method. The main difference is that a macro generally has call-by-name/call-by-need semantics, and that it will modify it’s current or surrounding AST nodes in some way. Yes, you heard me right, the macro facility will allow you to modify AST siblings. In fact, a macro could change your whole script from that point on… Of course Io can do this, with some trickery. But Ioke will have facilities to do it. Why? Doesn’t that sound dangerous… Yeah. It does, but on the other hand it will make it really easy to implement very flexible DSLs.

A final note – coming from Ruby I’ve always found Io’s libraries a bit hard to work with. Not sure why – it’s probably just personal taste, but the philosophy behind the Io libraries seem to not provide the things I like in a core library. So I will probably base Ioke’s core library more on Ruby than on Io.

There you have it. These are the main reasons I decided to not use Io. And once I started to diverge from Io, I decided to take a step back and start thinking through the language from scratch. Ioke will be the result, when it’s finished. (Haha. Finished. Like a language is ever finished… =)



Language revolution


JAOO was interesting this year. A collection of very diverse subjects, and many focusing on programming languages – we had presentations about functional programming, JavaScript, Fortress and JRuby. Guy Steele and Richard Gabriel did their 50 in 50 presentation, which was amazing. I’ve also managed to get quite a lot of work done on Ioke. The result of all this is that my head has been swimming with thoughts about programming languages. I’ve also had the good fortune of spending time talking about languages with such people as Bill Venners, Lars Bak, Neal Ford, Martin Fowler, Guy Steele, Richard Gabriel, Dave Thomas, Erik Meijer, Jim des Rivieres, Josh Holmes and many others.

It is obvious that we live in interesting times for programming languages. But are they interesting enough? What are the current trends in cutting edge programming languages? I can see these:

  • Better implementation techniques. V8 is an example of this, and so is Hotspot. V8 employs new techniques to drive innovation further, while Hotspot’s engineers continuously adds both old and new techniques to their tool box.
  • DSLs. The focus by some people on domain specific languages seem to be part of the larger focus on languages as an important tool.
  • Functional semantics. Erik Meijers keynote was the largest push in this direction, although many languages keep adding features that make it easier to work in a functional style. Clojure is one of the new languages that come from this point, and so is Scala. The focus on concurrency generally lead people to the conclusion that a more functional style is necessary. From the concurrency aspect we get the recent focus on Erlang. Fortress aslo seems to be mostly in this category.
  • Static typing. Scala and Haskell are probably the most representative of this approach, in trying to stretch static typing as far as possible to improve both the programmer experience, semantics and performance.

Is this really it? You can quibble about the specific categories and where the borders are. I’m not entirely satisfied with where I put Fortress, for example, but all in all it feels like this is what’s going on.

Seeing 50 in 50 reminded me about how many languages we have seen, and how different these all are. It feels like most of the innovation happened in the past. So why is the current state of programming languages so poor? Is it because other things overshadow the language itself? I really don’t believe that. I think a good enough language would enable better tools, more productivity and more successful projects. So why isn’t it happening? We seem to be stuck in a rut. Anders Hejlsberg said in his opening keynote that the last 10-15 years have been an anomaly. I really do hope so.

What is apparent from the list compiled above is that everything that currently happens is very much evolutionary in approach. Innovation is happening, but it’s mostly small innovation.

We need a language revolution. We need totally new ways at looking at programming languages. We need new innovation, unfettered by the failures and successes of times past. We need more language implementors. We need more people thinking about these things.

I don’t know what the new approaches need to be, but the way I see it the last 10 years have been quite disappointing. If programming languages really are important tools, why haven’t we seen the same kind of innovation in that field as we have in IDEs and tools? Why haven’t we seen totally new ideas crop up? Is it because language development is always evolutionary? Does it have to be? Or is everyone interested in the field already convinced that we are at the peak right now? Or that Lisp or Smalltalk was the peak?

What needs to be rethought? I’ve read Jonathan Edwards recently, and he writes a lot about revisiting basic ideas and conclusions. I don’t agree with everything he says, but in this matter he’s totally right. We need to revisit all assumptions. We need to figure out better ways of doing things. Programming languages are just too important. We shouldn’t be satisfied with the current approaches just because we don’t know anything better.

We need a revolution.



Naming of core concepts


Another part of naming that I’ve spent some time thinking about is what I should call the core concepts in the language. A typical example of this is using the word Object for the base of the structure. I’ve never liked this and I get the feeling that many languages have chosen this just because other languages have it, instead of for any good reason.

In a prototype based language it definitely doesn’t feel good. My current favorite name for the hierarchy base is right now Origin. Since everything is actually copied from this object, that word seems right.

Another naming issue that turns up quite quickly is what the things you store in an object is called. In JavaScript they are called properties. In Io they are called slots. It’s hard to articulate why I don’t like these names. Maybe this is a personal preference, but at the moment I’m considering calling them ‘cells’ in Ioke.

What about String? That does seem like an arbitrary choice too. A String is probably short for String of characters in most cases, and that’s really an implementation choice. What if you do like JavaScript engines where a String is actually a collection of character buffers stitched together? In that case String feels like a slightly suspect name. For Ioke I’m currently leaning towards just calling it Text. Text is immutable by default and you need a TextBuffer to be able to modify text on the fly.

Another thing that sometimes feel very strange in prototype based languages is the name you give the operation to create new instances. In Io it’s called clone. In JavaScript you use new. Here I’m really not sure what to do. At the moment I’m considering leaving it to ‘clone’ but provide two factory methods that will allow better flow while reading and also avoid the clone keyword. Say that you have something called Person. To create a new Person the code should look like this:

p = Person with(
  givenName = "Ola"
  surName = "Bini")

This code will first call clone, and then execute the code given to with inside of the context of the newly cloned object. This pattern should be quite common when creating instances. The other common person I see occurring is this:

tb = TextBuffer from("some text")

Maybe there are other useful cases, but with and from will work quite well for most purposes in the core libraries. What’s nice is that these methods are extremely easy to create and they don’t need any fancy runtime support. They are entirely composable from other primitives.

This is currently the kind of decisions I’m trying to make. If you have something that works nice instead of ‘clone’, do let me know.



Variations on naming


I’ve been spending lots of time thinking about naming lately. What kind of naming scheme do you follow in your class names? How does your method names look like? Your variables? Do you use active words or passive? Should you use longer, descriptive names or shorter, more succinct? How can you best use symbolic freedom to allow better naming? What kind of names do you choose for your core classes? Does it matter or should you just go for Good ‘Ole ‘Object’?

These musings is part of my design thoughts about Ioke, and I’ll describe some of the decisions I’ve made.

I first want to start with something simple. What naming scheme do you follow for your class-like objects? I’m going to go the same way as Java here, using capitalized words. This will be a convention and not anything forced by syntax, since the class-like objects will actually not be classes since Ioke is a prototype based language.

So what about method names and variable names? First of all there is no distinction in Ioke. Everything is a message. There are several variations here that are quite common. I have opinions on all of them, of course.

  • C#: A typical method might be called ‘ShouldRender’. The words are capitalized and put together without any separation characters. Most symbols aren’t allowed so method names are generally restricted to numbers, letters and underscore. This style doesn’t appeal to me at all. I find it really hard to read. That the naming convention is the same class names makes it hard to discern these names from each other. Having the words connected without any separation characters also doesn’t help.
  • Java: The same as C# except that the first character is lower case: ‘shouldRender’. The same restrictions apply as for C#. I find it slightly easier to read than C#, since there is specific difference between class names and method names. But the train wreck style of putting together the words is still inconvenient.
  • Ruby: With symbol freedom and small case separated by underscores, Ruby has a quite different style. The example would look like this: ‘should_render?’. The question mark is really nice to have there, and the under scores really make it easier to read. Of course, the difference between class names and method names is quite large which is both a good and bad thing.
  • Lisp: Typical Lisp naming is all lower case and separated by dashes: ‘should-render?’. Lisp also has high degrees of symbolic freedom so you can use basically any character except for white space and close parenthesis in it. I like this style a lot. It’s eminently readable but the main problem is the dash. Allowing dashes in names in a language with infix operators means that you must use spaces around a minus sign which really would annoy me. So even though I like this style I don’t think it’s practical unless your language uses Lisp style notation for operator names.
  • Smalltalk: This style of naming is definitely the most different. It tries to avoid words that run together by using keyword selectors. That means you interleave the arguments with the parts of the name of the method to call. For example: ‘render: “str” on: screen1’ is actually a method called ‘render:on:’. The method of naming have really nice reading benefits but there is also a high cost. Specifically, Ioke uses spaces to chain invocation, which means that if I used Smalltak style names I would need to surround all method invocations with parenthesis which won’t look good at all in this case. There are lots of good reasons for this naming but ultimately it doesn’t work.

Those are basically the naming styles I can think of right now. Everything else is mostly variations on it. So what will I use for Ioke? I don’t know exactly, but the two alternatives I’m considering right now is Ruby names and Java names + symbolic freedom. I’m leaning towards the Java naming. Adding more symbols will make it easier to use good names. Small things like question marks and exclamation marks really make a difference. Another reason for going with Java names is that it allows interacting with Java without doing name mangling (lke JRuby does). That’s highly attractive, even though the method names will be a bit more compact with this convention.

Any opinions on this?



Ioke


I’m sitting here at JAOO, waiting for the second day to start. The first presentation will be a keynote by Lars Bak about V8. It was a quite language heavy event yesterday too, with both Anders Hejlsberg and Erik Meijer keynoting about languages – plus there were introductions to both Fortress and Scala going on. And after the JVM language summit last week, I feel like the world is finally starting to notice the importance of programming languages.

So it seems only fitting that I’ve decided to go public with Ioke, the source code, what it is and where it’s going.

Ioke is a strongly typed, extremely dynamic, prototype based object oriented language. It’s homoiconic and got built in support for several kinds of macros. The languages that most closely influence Ioke is Io, Smalltalk, Self, Ruby and Lisp (Specifically Common Lisp).

The language is currently built on top of the JVM, but I’m currently considering compiling it down to JavaScript and run it on V8.

I have several goals with the language but the most specific one is to create a language that combines the things I like about Ruby and Lisp together. It turns out that Io already has many of the features I’m looking for, but in some cases doesn’t go far enough. I also wanted to have a language that is very well suited to express internal DSLs. I want to have a language that doesn’t get in my way, but also gives me loads of power to accomplish what I want. To that event I’ve designed a macro system that some people will probably find insane.

The current status of the implementation is that there isn’t any. I’m starting from scratch. I’ve already created two partial implementations to find the right way to implement the language, so with this blog post I’m starting the implementation from scratch. I know quite well what I want the language to look like and how it should work.

I’ve used Scala for the other two implementations but have decided to not do that for this implementation. The reason being one that Charles Nutter often talks about – that having to include the Scala runtime in the runtime for Ioke seems very inconvenient. So the implementation will initially use Java, but I’m aiming for the language to be self hosting as quickly as possible. That includes creating an Ioke Antlr backend, so it will take some time.

I’m going to post about Ioke quite regularly while I’m working on it, talking about design decisions and other things related to it. I will try to base me decisions in Ioke on what seems right, and not necessarily on the words chosen for representation in other language. I’ll try to talk about my reasoning behind choices like this.

And what about performance? Well, I know already that it will be atrocious. If you want to do scientific computing, maybe Ioke won’t be for you. The current design of the language will make it fairly hard to do any kinds of performance tunings, but I do have a plan for how to compile it down to bytecode at least. This still doesn’t mean it will perform extremely well, but my goals for Ioke specifically doesn’t include performance. I care about performance sometimes, but sometimes I don’t and Ioke is a tool I want to have for those cases where raw expressiveness power is what is most important.

You can follow the development in my git repository at http://github.com/olabini/ioke.