July 23rd, 2010
Emerging Languages camp – day 2
The second day of Emerging Languages camp was at least as good as the first day. We also managed to squeeze in four more talks, since everybode agreed that the afternoon pause was too long and ineffective during day one. At the end of the day my brain was substantially melted that I didn’t even contemplate finishing these comments. But after some sleep I think I have a fresh perspective.
The sessions were a bit more varied compared to the first day – both in quality and how far out the ideas were. Because of how my interest in various subject vary, there might be some inconsistency in length of reporting on the different languages.
Anyway, here goes:
Kodu
Kodu is a language from Microsoft for creating games. It’s specifically aimed at kids to see if they can learn programming in a better way using something like this. The language uses icons and a backend text based syntax to make it easy for someone to program using structure instead of syntax. You get a basic 3d environment where you can modify and edit things in various ways. Another important part of the design is to get the game to quickly do something, so you get immediate feedback. Everything added to the language is user tested before adding it – including doing gender testing. They thought long and hard about whether they should add conjunctions or not – but ended up deciding for doing it. You work with an XBox when programming and running the game. It’s also free. Overall, Kodu looks like a really nice and innovative initiative, probably going back as far as Logo in terms of inspiration. Very nice.
Clojure
Rich didn’t actually talk much about Clojure in general, but decided to focus on a specific problem he is working on solving. His talk title doesn’t really say much about this, though: “Persistent, Transience, Persistents, Transients and Pods – invasion of the value snatchers”. It was a great talk with lots of information coming extremely fast. I found myself focusing more during this talk than during any other during the conference, just to follow all threads of thought.
Rich spent some time giving an introduction to persistent data structures so everyone knew how Clojure works with them – including how they are turned into transients – since that’s where the new feature comes in.
An important part of persistent data structures is that yu preserve the performance guarantees of a mutable equivalent of that data structure. Clojure uses bit-partitioned hash tries, originally described by Phil Bagwell. This allows Clojure to have structural sharing, which means it’s safe to “update” something – the old version is retained. It uses path copying to make it possible to udpate with a low cost. There is definitely cost to doing it, but it works well in a concurrent environment where other solutions would be more costly to get correct results.
Clojure has an epochal time model that allows Clojure to view things as values inbetween being “modified”. State is at one step higher than that, so you can see mutable change as a creation of a new value from an existing value that is then put into the same reference the original value existed in. Clojure has four different types of references with various semantics for coordination.
To get good performance, some Clojure functions will actually mutate state that is invisible to anyone else to efficiently create new data structures. To get performance that is acceptable to Rich Clojure, data structures are not implemented using purely immutable data structures (Okasaki style) from the Java side. Persistent data structures also doesn’t scale to larger changes, specifically multiple collections, several steps or other situations where you want to have functional end points but efficient mutation inbetween.
Transients is a feature that allows Clojure to give birth to a data structure. Clojures transients will accumulate changes in a safe way and can then finally give you a persistent value. Only vectors and hash-maps are currently supported. Lists are not, since there is no benefit in doing that. Transients also enforce thread isolation. Composite operations are OK, and so is multi-collection work and you don’t need any locks for this. This is already in Clojure, but they might be doing too much. They both handle editing and enforce the constraints on it, such as single-threadedness. Transients can sometimes return new values too, even on mutating operations.
Pods allow you to split out the policy from transients. Values go in, values go out. The process goes through the pod. Different policies are possible, such as single-threadness or mutexes. A pod knows how to make a transient version of a value. Functions to modify a pod will have to return a new thing (or the same thing). Dereferencing the pod allows you to get a new value from a pod at that point. This gives you the possbility to apply recipes on ordinary Java objects too. A good example is String vs StringBuilder. Pods can ensure lock acquisition order, but not lock composition – although pods can detect it at least. There are still a few details in the design that Rich hasn’t decided on yet.
All in all, a very interesting talk, about the kind of concurrency problems you wish your language had.
E/Caja
Mark Miller recapped the interaction models of the Web, starting with static frames going to the current mess of JavaScript fragments going back and forth, using JSONP, AJAX and Comet. He also talks a bit about the adoption curves of languages and why some languages get adopted. Posits that a mess of features may be easier to get adopted. This means many languages succeed by adding complexity.
E is an experiment in expressing actors in a persistent way. He used some of the lessons from E combined with AJAX/JavaScript to create Caja, a secure language. Some of the features from Caja were then used to start work on EcmaScript 5. They are currently working on a standard for SES, secure JavaScript. Dr. SES is an extension of this, that stands for Distributed, Resilient, Secure JavaScript. Object capabilities involve two additions to a regular memory safety and encapsulation model; effects only on held references, and no powerful references by default. This means a reference graph becomes an access graph. Caja can sanitize JavaScript to prevent malicious behavior, but preserve the semantic meaning of the program outside of that.
He showed some examples of how Caja can be used to sanitize regular JavaScript and have it running securely. Very interesting stuff, although the generated code didn’t look as amenable to debugging as something like CoffeeScript.
Fancy
Fancy is a language that tries to be friendly to newcomers, with good documentation, a clean implementation and so on. It’s inspired b several languages: Smalltalk (pure message passing, everything’s an object, dynamic, class based OO, metaprogramming, reflective), Ruby (file based, embraces UNIX, literal syntax, class definition is executable script, fixed some inconsistencies with procs/lambdas/blocks), Erlang (message passing concurrency, light weight processes – not implemented yet). Fancy takes the opinion that first class is good; classes, methods, documentation, tests should all be first class. FancySpec is a simple version of RSpec. Tests for all built in classes and methods are there. These tests are not dependent on implementation. There are plans to port Fancy to a VM. Methods marked with NATIVE will have an equivalent method in Fancy and in the interpreter, to improve performance.
It’s got dynamic scoping and method caching. Logic can be defined based on the sender of a message, which makes it possible to do things like private and public.
Exceptions are taken directly from the implementation (ie C++).
The language seems to be pretty similar to Ruby in semantics, but more Smalltalk like syntax.
BitC
BitC is geared towards critical systems code. Resource contrained, CPU, memory, those kind of areas. One cache miss sometimes counts. Abstraction is fine, but only if it’s the right one. Variance constrained too. Predictability is very important, so something like a JIT can be a problem. Statically exception free. “Zero” runtime footprint. Non-actuarial risk model. Mean time between failures in decades. Problem is to establish confidence. After other failures in this area, the conclusion has been that BitC shouldn’t be a prover.
The language is an imperative functional language with HM-style parametric type system. You have explicit control of representation. State is handled in a first class manner. Inferencing actually infers mutability in lots of cases. Dependent range checking isn’t there yet, but is coming soon. “The power of ML/Haskell”, “The low-level expressiveness of C”, “Near-zero innovation”.
Trylon
Trylon is a small language, indentionation based and compiles through C. It’s object oriented, with prototypes under the class based system. According to the author, nothing really new in the language – he just did it for his own sake. There are no users so far except for the author.
ooc
The language tries to be a high level low level language. It mixes paradigms quite substantially and has some nice features. It’s class based, and mostly statically typed.
Coherence/Subtext
Jonathan Edwards started this presentation by showing a small example where the ordering of statements in an implementation is dependent on what representation you use for data, and shows that it’s impossible to handle this case in a general way. From that point he claims that there is a fundamental tension between styles in a language, and you can only get two of these three: Declarative programming, Mutable state and Data Structures. I’m not sure if I agree with his conclusions, and the initial example didn’t feel like anything I’ve ever had trouble with.
Based on the conclusion that you can only have two of the three, he goes on to claim that the thing that cases all these problems is aliasing. So in order to avoid aliasing, his system uses objects where instances are physically always contained within another object. This means you can refer to these objects without having actual pointers – and thus cannot do aliasing either. From that point on, his system allows declarative programming of the flow, where updates never oscillate back out to create more updates.
Lots of interesting ideas in this talk, but I’m not sure I agree with either the premise or the conclusions.
Finch
Finch is a small programming language, bytecode compiled with fibers, blocks, TCO, objects, prototypes, a REPL and Smalltalk style message selectors. In the feature, the author aims to add metaprogramming, some self-hosting, continuations and concurrency features.
Circa
Circa is a small programming language that allows you to get immediate feedback. It’s aimed at game programming, and achieves this by running the script many times (one time for every frame as far as I understood it). You then specify what state you have in your program, and this state will be automatically persisted between invocations, so that a specific invocation of a specific function will always get access to the same state it started out with. This was a very interesting but weird model. It seems to work really well for smaller prototyping of games and graphics but I’m wondering what can be done to expand it.
Wheeler
Wheeler is a proof of concept presented by Matt Youell. It’s pretty hard to describe, and I’m not even sure if there’s a computational model there yet. The project is apparently just a few weeks old, and the ideas are still in progress. The basic tenets of the language seems to be that you work with categories of things and establish transitions between them. A transition pattern matches things that it looks for, which means that things like syntax and ordering doesn’t mean as much. The author calls it mutual dispatch, because it uses the types/categories of everything involved to establish what transitions to use. At this point there is no time model, so everything happens in one sweep, but once a time model gets in there it might be very interesting. To me it looked a bit like a cross between neural networks and cellullar automata.
Interval arithmetic
Alan (Mr Frink) gave a talk about the problems with floating point numbers, and one way of handling that. Floating point numbers cause problems by making it possible to introduce small errors.
Intervals is a new kind of number. It represents a specific number by giving two end points and saying the real number is somewhere within that interval. You can see it in two different ways: “the right value’s in there somewhere but I’m not sure where” or “the variable takes on ALL values in the interval simultaneously”.
This was a very interesting discussion, and you can find out more about it from Frink’s web page (just search for Frink and interval arithmetic). At the end of the presentation, Alan gave this challenge to other languages:
for:
x=77617
y=33096
calculate:
((333 + 3/4) – x^2) y^6 + x^2 (11x^2 y^2 – 121 y^4 – 2) + (5 + 1/2) y^8 + x/(2y)
Ioke handles it correctly, both using ratios and using decimals.
Stratified Javascript
Stratified JavaScript adds some concurrency features to JavaScript based on Strata. It looked like a very principled approach to giving JS concurrency primitives that are easy to use at the same time as they are very powerful. The presenter showed several examples of communication, blocking and coordination working really well.
Factor
Factor is a very high level stack based language created by Slava Pestov. He went through some of the things that Factor does well and other dynamic programming languages handle less well, like reloading code from the REPL. Lots of other small tidbits of how powerful Factor is and how expressive a stack language can be. At the end of the day I still think it’s interesting how much Ioke code sometimes resemble Factor, even though the underlying semantics are vastly different.
D
Walter Bright showed D, his systems level programming language. He focused on showing that it can do several different paradigms in the same language – all of it looked very, very clean, but I got the impression that D is an extremely big language from these examples. To summarize, D can do inline assembler, class based OO, generative programming, RAII, procedural, functional and concurrent programming (and I probably missed a few). I liked the approach to immutability, but I must admit I’m scared of the sheer size of the language. It’s impressive how such a big language can get so good at compile times.
AmbientTalk
AmbientTalk is a language built on top of Java that puts communication in center. It is supposed to be used in areas where you have bad network connectivity and want to communicate inbetween different devices in a flexible way. Things like network outages aren’t exceptions because they will happen all the time in the environments AmbientTalk is built on. The language embraces futures to a large degree and also takes a principled approach to how Java integration works – so that if you send an AmbientTalk object in to Java, it will work as if you had sent it to a remote device, and the only way Java can interact with that object is by sending messages to it. Much interesting stuff in this talk.
And that was it. I can obviously not capture all the interesting hall way and pub conversations that were had, but hopefully this summary will be helpful until the videos come along in two to four weeks. I would call this conference a total success and I really look forward to next year.