February 12th, 2012
Notes on syntax
The last few years the expressiveness of programming languages have been on my mind. There are many things that comes into consideration for expressiveness, not matter what definition you actually end up using. However, what I’ve been thinking about lately is syntax. There’s a lot of talk about syntax and many opinions. What made me start thinking more about it lately was a few blog posts I read that kind of annoyed me a bit. So I thought it was time to put out some of my thoughts on syntax here.
I guess the first question to answer is whether syntax matters for a programming language. The traditional computer science view is largely that syntax doesn’t matter. And in a reductionist, system level view of the world this is understandable. However, you also have the opposite view which comes strongly into effect especially when talking about learning a new language, but also for reading existing code. At that point many people are of the opinion that syntax is extremely important.
The way I approach the question is based on programming language design. What can I do when designing a language to make it more expressive for as many users as possible. To me, syntax plays a big part in this. I am not saying that a language should designed with a focus on syntax or even with syntax first. But the language syntax is the user interface for a programmer, and as such there are many aspects of the syntax that should help a programmer. Help them with what? Well, understanding for one. Reading. Communicating. I suspect that writing is not something we’re very much interested in optimizing for in syntax, but that’s OK. Typing fewer characters doesn’t actually optimize for writing either – the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. You would definitely have to type much less – but would that in any way help you write the book? No, probably it would make it harder. So typing I definitely don’t want to optimize. However, I would like to make it easy for a programmer to express an idea as consicely as they can. To me, this is about mentioning all things that are relevant, without mentioning irrelevant things. But incidentally, a syntax with that property is probably going to be easier to communicate with, and also to read, so I don’t think focusing on writing at all is the right thing to do.
Fundamentally, programming is about building abstractions. We are putting together extremely intricate mind castles and then try to express them in such a way that our computers will realize them. Concepts, abstractions – and manipulating and communicating them – are the pieces underlying programming languages, and it’s really what all languages must do in some way. A syntax that makes it easier to think about hard abstractions is a syntax that will make it easier to write good and robust programs. If we talk about the Sapir-Whorf hypothesis and linguistic relativity, I suspect that programmers have an easier time reasoning about a problem if their language choice makes those abstractions clearer. And syntax is one way of making that process easier. Simply put, the things we manipulate with programming languages are hard to think about, and good syntax can improve that.
Seeing as we are talking about reading – who is this person reading? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on. Optimally we would like to have both, I guess, but that doesn’t seem very realistic. The things that make syntax useful to an expert are different than what makes it easy to read for a novice.
At this point I need to make a request – Rich Hickey gave a talk at Strange Loop a few months ago. It’s called Simple made Easy and you can watch it here: http://www.infoq.com/presentations/Simple-Made-Easy – you should watch it now.
Simply put, if you had never learnt any German, should you really expect to be able to read it? Is it such a huge problem that someone who has never studied Prolog will have no idea what’s going on until they study it a bit? Doesn’t it make sense that people who understand German can express all the things they need to say in that language? Even worse, when it comes to programming languages, people expect them to be readable to people who have never programmed before! Why in world would that ever be a useful goal? It would be like saying German is not readable (and is thus a bad language) because dolphins can’t read it.
A tangential aspect to the simple versus easy of programming languages is also how our current syntactic choices echo what’s been done earlier. It’s quite uncommon with a syntax design that becomes wildly successful while looking completely different from previous languages. This seems to have more to do with how easy a language is to learn, rather than how good the syntax actually is by itself. As such, it’s suspect. Historical accidents seem to contribute much more syntax design than I am comfortable with.
Summarizing: when we talk about reading programming languages, it doesn’t make much sense to optimize for someone who doesn’t know the language. In fact, we need to take as a given that a person knows a programming language. Then we can start talking about what aspects reduce complexity and improve communication for a programmer.
When are talking about reading of languages, one thing that sometimes come up is the need for redundancy. Specifically, one of the blogs that inspired these thoughts basically claimed that the redundancy in the design of Java was a good thing, because it improved readability. Now, I find this quite interesting – I have never seen any research that explains why this would be the case. In fact, the only argument in support I’ve heard that backs up the idea is that natural languages have highly redundant elements, and thus programming languages should too. First, that’s not actually true for all natural languages – but we must also consider _why_ natural languages have so much redundancy built in. Natural languages are not designed (with a few exceptions) – they grow to have the features they have because they are useful. But reading, writing, speaking and listening of natural languages have so different evolutionary pressures from each other that they should be treated differently. The reason we need redundancy is simply because it’s very hard to speak and listen without it. For all intents and purposes, what is considered good and idiomatic in spoken language is very different from written language. I just don’t buy this argument for redundancy. It might be good with redundancy in programming language syntax, but so far I remain to be convinced.
It is sometimes educational to look at mathematical notation. However, mathematical notation is just that – notation. I’m not convinced we can have one single notation for programming languages, and I don’t think it’s something to aspire to. But the useful lesson from math notation is how terse it is. However, you still need to spend a long time to digest what it means. That’s because the ideas are deep. The thinking that went into them is deep. If we ever come to a point where programming languages can embody as deep ideas in as terse a notation, I suspect we will have figured out how to design programming language syntax that is way better than what we have right now.
I think this covers most of the things I wanted to cover. At some point I would like to talk about why I think Smalltalk, Ruby, Lisp and some others have quite good syntax, and how that syntax is intimately related with why those languages are powerful and expressive. Some other random thoughts I wanted to cover was evolvability of language syntax, whether a syntax should be designed to be easy to parse, and possibly also how much English specifically has impact the design of programming languages. But these are thoughts for another time. Suffice to say, syntax matters.
November 10th, 2011
Announcing JesCov – JavaScript code coverage
It seems the JavaScript tool space is not completely saturated yet. As I mentioned in my previous post I’ve had particular trouble finding a good solution to code coverage. So I decided to build my own version of it. The specific feature to notice is transparent translation of source code and support for branch coverage. It also has some limitations at the moment, of course. This is release 0.0.1 and as such is definitely a first release. If you happen to use the Jasmine JUnit runner it should be possible to drop in this directly and have something working immediately.
You can find information, examples and downloads here: http://jescov.olabini.com
October 25th, 2011
JavaScript in the small
My most recent project was on a fairly typical Java Web project where we had a component that should be written in JavaScript. Nothing fancy, and nothing big. It does seem like people are still not taking JavaScript seriously in these kind of environments. So I wanted to take a few minutes and talk about how we developed JavaScript on this project. The kind of advice I’ll be giving here is well suited for web projects with small to medium amounts of JavaScript. If you’re writing large parts of your application on the client side, you probably want to go with a full stack framework to help you out, so these things are less relevant.
Of course, most if not all things I’ll cover here can be gleaned from other sources, and probably better. And if you’re an experienced JavaScript developer, you are probably fine without this article.
I had to do two things to get efficient in using JavaScript. The first one was to learn to ignore the syntax. The syntax is clunky and definitely gets in the way. But with the right habits (such as having a shortcut for function/lambda literals, and making sure to always put the returned value on the same line as the return statement) I’ve been able to see through the syntax and basically use JavaScript in a Scheme-like style. The second thing is to completely ignore the object system. I use a lot of object literals, but not really any constructors or the this-keyword. Both of these features can be used well, but they are also very clunky, and hard to get everyone on a team to understand the same way. I love prototype based OO as a model, and I’ve used it with success in Ioke and Seph. But with JavaScript I generally shy away from it.
The module pattern
var olaBiniSeriousBanking = (function() { var balance = 0; function deposit(num) { balance += num; } function checkOverdraft(amount) { if(balance - amount < 0) { throw "Can't withdraw more than exists in account"; } } function withdraw(amount) { checkOverdraft(amount); balance -= amount; } return {deposit: deposit, withdraw: withdraw}; })();
var olaBiniGreeterModule = (function(greeting) { return {greet: function(name) { console.log(greeting + ", " + name); }}; }); var olaBiniGreeterEng = olaBiniGreeterModule("Hello"); var olaBiniGreeterSwe = olaBiniGreeterModule("Hejsan");
RequireJS
// in file foo.js require(["bar", "quux"], function(bar, quux) { return {doSomething: function() { return bar.something() + quux.something(); }}; });
<script data-main="scripts/main" src="scripts/require.js"> </script>
// in file main.js require(["foo"], function(foo) { require.ready(function() { console.log(foo.doSomething()); }); });
No JavaScript in HTML
Init functions on ready
// foo.js require(["bar"], function(bar) { function sayHello(node) { console.log("hello " + node); } function attachEventHandlers(dom) { dom.query(".fluxCapacitors").onclick(sayHello); } function init(dom) { bar.init(dom); attachEventHandlers(dom); } return {init: init}; }); // main.js require(["foo"], function(foo) { require.ready(function() { foo.init(dojo); }); });
Lots of callbacks
function checkForChangesOn(node) { return function() { if(dojo.query(node).length() > 42) { console.log("Warning, flux reactor in flax"); } }; } dojo.query(".clixies").onclick(checkForChangesOn(".fluxes")); dojo.query(".moxies").onclick(checkForChangesOn(".flexes"));
Lots of anonymous objects
Testing
Open questions
Summary
August 11th, 2011
Injecting loggers using Spring
On my current project we are using Spring MVC and we try to use autowiring as much as possible. I personally strongly prefer constructor injection, since this gives me the luxury of working with final fields. I also like being able to inject all things a class needs – including loggers. Most of the time I don’t really want to use custom loggers from tests, but sometimes I do want to make sure something gets logged correctly, and being able to inject a logger seems like a natural way of doing that. So, with that preamble out of the way, my problem was that this seemed quite hard to achieve in Spring. Specifically, I use SLF4J, and I want to inject the equivalent of doing LoggerFactory.getLogger(MyBusinessObject.class). Sadly, Spring doesn’t give access to the place where something is going to be injected in any of the hooks available. Most solutions I found to this problem relies on using a BeanPostProcessor to set a field on the object after it’s been created. This defeats three of my purposes/principles – I can’t use the logger in the constructor, the field will be mutable and I won’t get told by Spring if I’ve made a mistake in my wiring.
There was however one solution I found in a StackOverflow post – sadly it wasn’t complete. Specifically, I needed to use it in a Spring MVC setting and also from inside of tests. So this blog post is mainly to provide the complete solution for something like this. It’s a simple problem, but it was surprisingly tricky to get working correctly. But now that I have it, it will be very convenient. This code is for Spring 3.1, and I haven’t tested it on anything else.
The first part of this injection is to create our own custom BeanFactory – which is what Spring uses internally to manage beans and dependencies. The default one is called DefaultListableBeanFactory and we will just subclass it like this:
public class LoggerInjectingListableBeanFactory extends DefaultListableBeanFactory {
public LoggerInjectingListableBeanFactory() { setParameterNameDiscoverer( new LocalVariableTableParameterNameDiscoverer()); setAutowireCandidateResolver( new QualifierAnnotationAutowireCandidateResolver());
} public LoggerInjectingListableBeanFactory( BeanFactory parentBeanFactory) { super(parentBeanFactory); setParameterNameDiscoverer( new LocalVariableTableParameterNameDiscoverer()); setAutowireCandidateResolver( new QualifierAnnotationAutowireCandidateResolver()); } @Override public Object resolveDependency( DependencyDescriptor descriptor, String beanName, Set<String> autowiredBeanNames, TypeConverter typeConverter) throws BeansException { Class<?> declaringClass = null; if(descriptor.getMethodParameter() != null) { declaringClass = descriptor.getMethodParameter() .getDeclaringClass(); } else if(descriptor.getField() != null) { declaringClass = descriptor.getField() .getDeclaringClass(); } if(Logger.class.isAssignableFrom(descriptor.getDependencyType())) { return LoggerFactory.getLogger(declaringClass); } else { return super.resolveDependency(descriptor, beanName, autowiredBeanNames, typeConverter); } } }
The magic happens inside of resolveDependency where we can figure out the declaring class by checking either the method parameter or the field – and then see whether the thing asked for is a Logger. Otherwise we just delegate to the super implementation.
In order to use this from anything we need an actual ApplicationContext that uses it. I didn’t find any hook to set the BeanFactory after the application context was created, so I ended up creating two new ApplicationContext implementations – one for tests and one for the Spring MVC purpose. They are slightly different, but try to do so little as possible while retaining the behavior of the original. The application context for the tests look like this:
public class LoggerInjectingGenericApplicationContext extends GenericApplicationContext { public LoggerInjectingGenericApplicationContext() { super(new LoggerInjectingListableBeanFactory()); } }
This one just calls the super constructor with an instance of our custom bean factory. The application context for Spring MVC looks like this:
public class LoggerInjectingXmlWebApplicationContext extends XmlWebApplicationContext { @Override protected DefaultListableBeanFactory createBeanFactory() { return new LoggerInjectingListableBeanFactory( getInternalParentBeanFactory()); } }
The XmlWebApplicationContext doesn’t have a constructor that takes a bean factory, so instead we override the createBeanFactory method to return our custom instance. In order to actually use these implementations some more things are needed. In order to get our tests to use it, a test.context.support.ContextLoader implementation is necessary. This code is mostly just copied from the default implementation – sadly it doesn’t provide any extension points and the place I want to override are in the middle of two final methods. It feels quite ugly to just copy the implementations, but there are no hooks for this…
public class LoggerInjectingApplicationContextLoader extends AbstractContextLoader { public final ApplicationContext loadContext( MergedContextConfiguration mergedContextConfiguration) throws Exception { String[] locations = mergedContextConfiguration.getLocations(); GenericApplicationContext context = new LoggerInjectingGenericApplicationContext(); context.getEnvironment().setActiveProfiles( mergedContextConfiguration.getActiveProfiles()); loadBeanDefinitions(context, locations); AnnotationConfigUtils.registerAnnotationConfigProcessors(context); context.refresh(); context.registerShutdownHook(); return context; } public final ConfigurableApplicationContext loadContext(String... locations) throws Exception { GenericApplicationContext context = new LoggerInjectingGenericApplicationContext(); loadBeanDefinitions(context, locations); AnnotationConfigUtils.registerAnnotationConfigProcessors(context); context.refresh(); context.registerShutdownHook(); return context; } protected void loadBeanDefinitions( GenericApplicationContext context, String... locations) { createBeanDefinitionReader(context). loadBeanDefinitions(locations); } protected BeanDefinitionReader createBeanDefinitionReader( final GenericApplicationContext context) { return new XmlBeanDefinitionReader(context); } @Override public String getResourceSuffix() { return "-context.xml"; } }
The final thing necessary to get your tests to use the custom Bean Factory is to specify the loader to use in the ContextConfiguration on your test class, like this:
@RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(value = "file:our-app-config.xml", loader = LoggerInjectingApplicationContextLoader.class) public class SomeTest { }
In order to get Spring MVC to pick this up, you can edit your web.xml and add a new init-param for the DispatcherServlet, like this:
<servlet> <servlet-name>Spring MVC Dispatcher Servlet</servlet-name> <servlet-class> org.springframework.web.servlet.DispatcherServlet </servlet-class> <init-param> <param-name>contextConfigLocation</param-name> <param-value>WEB-INF/our-app-config.xml</param-value> </init-param> <init-param> <param-name>contextClass</param-name> <param-value> com.example.LoggerInjectingXmlWebApplicationContext </param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet>
This approach seems to work well enough. Some of the code is slightly ugly and I would definitely love to have a better hook for injection points to know where it will get injected. Having factory methods be able to take the receiver object might be very convenient, for example. Being able to customize the bean factory seems like it also should be much easier than this.
March 14th, 2011
Seph – A Hard Language to Compile
I have recently started work on Seph again. I preannounced it last summer (here), then promply became extremely busy at work. Busy enough that I didn’t really have any energy to work on this project for a while. Sadly, I’m still as busy, but I’ve still managed to find some small slivers of time to start working on the compiler parts of the implementation. This has been made much easier and more fun since JSR292 is getting near to completion, and an ASM 4 branch is available that makes it easier to compile Java bytecode with support for invoke dynamic built in.
So that means that the current code in the repository actually goes a fair bit to where I want it to be. Specifically, the compiler compiles most code except for abstractions that create abstractions, and calls that take keyword arguments. Assignments is not supported either right now. I don’t expect any of these features to be very tricky to implement, so I’m waiting with that and working on other more complicated things.
This blog post is meant to serve two purposes. The first one is to just tell the world that Seph as an idea and project actually is alive and being worked on – and what progress has been made. The other aspect of this post is to talk about some of the things that make Seph a quite tricky language to compile. I will also include some thoughts I have on how to solve these problems – and suggestions are very welcome if you know of a better approach.
To recap, the constraints Seph is working under is that it has to run on Java 7. It has to be fully compiled (in fact, I haven’t decided if I’ll keep the interpreter at all after the compiler is working). And it has to be fast. Ish. I’m aiming for Ruby 1.8-speed at least. I don’t think that’s unreasonable, considering the dimensions of flexibility Seph will have to allow.
So let’s dive in. These are the major pain points right now – and they are in some cases quite interconnected…
Tail recursion
All Seph code has to be tail recursive, which means a tail call should never grow the stack. In order to make this happen on the JVM you need to save information away somewhere about where to continue the call. Then anyone using a value has to check for a tail marker token, and if one that is found, that caller will have to do a repeated call on the current tail until a real value is produced. All the information necessary for the tail will also have to be saved away somewhere.
The approach I’m currently taking is fairly similar to Erjangs. I have a SThread object that all Seph calls will have to pass along – this will act as a thread context as soon as I add light weight threads to Seph. But this place also serves a good place to save away information on where to go next. My current encoding of the tail is simply a MethodHandle that takes no arguments. So the only thing you need to do to pump the tail call is to repeatedly check for the token and call the tail method handle. Still, doing this all over the place might not be that performant. At the moment, the code is not looking up a MethodHandle from scratch in the hot path, but it will have to bind several arguments in order to create the tail method handle. I’m unsure what the performance implications of that will be right now.
Argument evaluation from the callee
One aspect of Seph that works the same as in Ioke is that a method invocation will never evaluate the arguments. The responsibility of evaluating arguments will be in the receiving code, not the calling code. And since we don’t know whether something will do a regular evaluation or do something macro-like, it’s impossible to actually pre-evaluate the arguments and push them on the stack.
The approach Ioke and the Seph interpreter takes is to just send in the Message object and allow the callee to evaluate it. But that’s exactly what I want to avoid with Seph – everything should be possible to compile, and be running hot if that’s possible. So sending Messages around defeats the purpose.
I’ve found an approach to compile this that actually works quite well. It also reduces code bloat in most circumstances. Basically, every piece of code that is part of a message send will be compiled to a separate method. So if you have something like foo(bar baz, qux) that will compile into the main activation method and two argument methods. This approach is recursive, of course. What this gives me is a protocol where I can use method handles to the argument methods, push them on the stack, and then allow the callee to evaluate them however they want. I can provide a standard evaluation path that just calls each of the method handles in turn to generate the values. But it also becomes very easy for me to send them in unevaluated. As an example this is almost exactly what the current implementation of the built in “if” method looks like. (It’s not exactly like this right now, because of transitional interpreter details).
public final static SephObject _if(SThread thread, LexicalScope scope, MethodHandle condition, MethodHandle then, MethodHandle _else) { SephObject result = (SephObject)condition.invokeExact(thread, scope, true, true); if(result.isTrue()) { if(null != then) { return (SephObject)then.invokeExact(thread, scope, true, true); } else { return Runtime.NIL; } } else { if(null != _else) { return (SephObject)_else.invokeExact(thread, scope, true, true); } else { return Runtime.NIL; } } }
Of course, this approach is not perfect. It’s still a lot of code bloat, I can’t use the stack to pass things to the argument evaluation, and the code to bind the argument method handles take up most of the generated code at the moment. Still, it seems to work and gives a lot of flexibility. And compiling regular method evaluations will make it possible to bind these argument method handles straight in to an invoke dynamic call site, which could improve the performance substantially when evaluating arguments (something that will probably happen quite often in real world code… =).
Intrinsics are just regular messages
Many of the things that are syntax elements in other languages are just messages in Seph. Things like “nil”, “true”, “false”, “if” and many others work exactly the same way as a regular message send to something you have defined yourself. In many cases this is totally unnecessary though – and in most cases knowing the implementation at the call site allow you to improve things substantially in many cases. I think it’s going to be fairly uncommong to override any of those standard names. But I still want to make it possible to do it. And I’m fine with the programs that do this takng a performance hit from it. So the approach I’ve come up with (but not implemented yet) is this – I will special case the compilation of every place that has the same name as one of the intrinsics. This special casing will bind to a different bootstrap method than regular Seph methods. As a running example, let’s consider compiling a piece of code with “true” in it. This will generate a message send that will be taken care of by a sephTrueBootstrapMethod. We still have to send in all the regular method activation arguments, though. What this bootstrap method will do is to set up a call site that points to a very special method handle. This method handle will be a guardWithTest created through a SwitchPoint specific to the true value. The first path of that GWT (guardWithTest) will just return the true value directly without any checks whatsoever. The else path of the GWT will fallback to a regular Seph fallback method that does inline caching and regular lookup. The magic happens with the SwitchPoint – the places that create new bindings will check for these intrinsic names and if one of those names is used anywhere in the client code, the SwitchPoint will be changed over to the slow path.
In summary, I think a fast path can be possible for many of these things for most programs. The behaviour when you override “if” should still work as expected, but will make the global performance of that program slower for the rest of the execution.
When does lexical scopes escape?
Seph has mutable lexical scopes. But it’s impossible to know which names will escape and which won’t – so as far as I can see, I can’t use the Java stack to represent variables except for in some small amount of very degenerate cases. I’m not sure if it’s worth it to have that code path yet, so I haven’t thought much about it.
Class based PICs aren’t a good fit
One of the standard optimizations that object oriented languages use is something called a polymorphic inline cache. The basic idea is that looking up a method is the really slow operation. So if you can save away the result of doing that, guarded by a very cheap test, then you can streamline the most common cases. Now, that cheap test is usually a check against the class. As long as you send in an instance with the same class, then a new method lookup doesn’t have to happen. Doing a getClass and then a identity equality on that is usually fairly fast (a pointer comparison on most architectures) – so you can builds PICs that don’t actually spend much time in the guard.
But Seph is a prototype based language. So any object in the system can have different methods or values associated with a name, and there is no clear delineation on objects with new names and values in them. Especially, since Seph objects are immutable, every new object will most likely have a new set of values in them. And saving a way objects and dispatching on them becomes much less performant, since the call sites will basically never work on the same object. Now, there are solutions to this – but most of them are tailored for languages where you usually use a class based pattern. V8 uses an approach called hidden classes to figure out things like that. I’m considering implementing something similar, but I’m a bit worried that the usage pattern of Seph will be far enough away from the class based world that it might not work well.
Summary
So, Seph is not terribly easy to compile, and I don’t have a good feeling for how fast it can actually be made. I guess we’ll have to wait and see. But it’s also an interesting challenge, coming up with solutions to these problems. I think I might also have to go on a new research binge, investigating how Self and NewtonScript did things.
January 16th, 2011
Safe(r) monkey patching
Ruby make it possible to pretty much change anything, anywhere. This is obviously very powerful, but it’s also something that can cause a lot of pain if it’s not done in a disciplined manner. The way this is handled on most Ruby projects is by heaving clear strategies for what to change, how to name it and where to put the source file. The most basic advice is to always use modules for extensions and changes if it is at all possible. There are several good reasons for this, but the main one is that it makes it easier for someone debugging your application to find out where the code is defined.
The one absolute rule that should never be violated in a Rails or Ruby project is to modify the original source code. In the worst case, fork the project and make the changes there, but never, never, never change code in vendor/plugins or vendor/gems.
Let’s start with a simple example. Say I want to recreate the presence method I mentioned in a previous blog post. A first version make look like this:
class Object def presence return self if present? end end
But if I open up IRb and get hold of this method, it’s not immediately obvious where it’s defined:
o = Object.new p o.method(:presence) #=> #<Method: Object#presence>
However, if I were to implement it using a module instead, like this:
module Presence def presence return self if present? end end Object.send :include, Presence
If I look at the method now, the output is a bit changed:
p o.method(:presence) #=> #<Method: Object(Presence)#presence>
We can now see that the method actually comes from the Presence module instead of the Object class. In most Ruby projects, these kind of extensions will be namespaced, using the word extensions or ext as part of the module name. When I add the presence method to code bases, I usually put it in lib/core_ext/object/presence.rb, in a module called CoreExt::Object::Presence. All of this to make it as easy to possible to find these extensions and changes.
There are many other benefits to putting an extension like this in a module. It makes your code cleaner, more flexible, and it composes better if you happen to have conflicting definitions. You can also use modules more selectively if you want, including just adding it to selected objects if necessary.
Props to my colleague Brian Guthrie for alerting me to this useful side effect of defining extensions with modules.
There is a slight wrinkle in this scenario, specifically for adding extensions to modules. Sadly, the way the Ruby module system works, you can’t include a new module into Enumerable and have that take effect in places where Enumerable has already been mixed in. Instead you have to define the methods directly on Enumerable. The general problem looks like this:
module X def hello 42 end end class Foo include X end Foo.new.hello #=> 42 module Y def goodbye 25 end end module X include Y end Foo.new.goodbye #=> undefined method `goodbye' for #<Foo:0x129f94> (NoMethodError)
This is a bit sad, since it means extensions have to be written in two different ways, depending on where you aim to use them. The general rules still applies — you should put the extensions in well named files that are easy to find. And if you can extract the functionality to a module and then delegate to that, that is preferrable.
January 16th, 2011
Comparing times and dates in Ruby
In one of the Rails projects I’m involved with, we do most of the local development against SQLite and then deploy against Oracle. This is a bit annoying for many reasons, but by far the largest cause of trouble is the handling of dates. I haven’t exactly figured out the rules, but for some reason sometimes Oracle returns DateTime in situations where SQLite returns a Date. This usually causes quite subtle problems that have effects in other parts of the application. This brings me to the small piece of advice I wanted to talk about in this column. Always make sure that you know if you are working with a Date, a Time or a DateTime, since these all have slightly different behavior, especially when it comes to comparisons.
The rule is quite simple. If you think you can have a Time object, make sure to turn it into a DateTime object before trying to compare it to a Date object. What happens otherwise? Unfunny things:
Date.today < Time.now #ArgumentError: comparison of Date with Time failed Time.now > Date.today #true Time.now == Date.today #false Date.today == Time.now #nil Date.today <=> Time.now #nil Date.today != Time.now #true
The first time I saw some of these results, I was a bit confused. Especially the last three. But they do make a twisted kind of sense. Namely, it’s OK for the <=> operator to return nil if it can’t do a comparison between two objects. And the != in Ruby is hardcoded to return the inverse of the value returned from ==, and the since nil is a falsey value, the inverse of that becomes true.
What I wanted to mention with these things is that you should always make sure you don’t have the Date on the left hand side of a comparison. Or if you want to do a comparison, explicitly call to_date to coerce them. Finally, if you want to do date and time comparisons, I find the best behavior usually comes from coercing both sides with to_datetime before doing the comparison.
January 15th, 2011
Panel on Internet Freedom
Next week, ThoughtWorks and The Churchill Club is organizing a live panel about Internet freedom and the implications of the recent WikiLeaks events. This is going to be a world class event with very engaging speakers, and it will also be streamed live if you can’t attend in person. Daniel Ellsberg, Peter Thiel, Clay Shirky, Jonathan Zittrain and Roy Singham will discuss various important subjects touching on the WikiLeaks controversy:
WikiLeaks: Why it Matters. Why it Doesn’t?
Wednesday, January 19, 2011
5:30-8:30 PM Pacific Standard Time
The purpose of this discussion is to take an objective look at the WikiLeaks controversy and its potential threats to the future of the free Internet. Notwithstanding the varied personal opinions of WikiLeaks and Julian Assange, this issue reaches beyond the actions of one person or one website. Precedents that will determine the very future of the Internet are being set as the world grapples with new social and information models. This is a serious issue worthy of serious discussion and debate.
Paul Jay, CEO and senior editor of The Real News Network will moderate the panel, which includes:
- Daniel Ellsberg, Former State and Defense Dept. Officialy prosecuted for releasing the Pentagon Papers
- Clay Shirky, Independent Internet Professional; Adjunct Professor, Interactive Telecommunications Program, New York University
- Neville Roy Singham, Founder and Chairman, ThoughtWorks
- Peter Thiel, President, Clarium Capital; Managing Partner, Founder’s Fund
- Jonathan Zittrain, Professor of Law and Professor of Computer Science, Harvard University; Co-founder, Berkman Center for Internet & Society
Please join us along with other executives from diverse industries and positions, all of whom will gather to listen and engage with these panelists who represent a rich cross section of the communities impacted by the WikiLeaks issue.
Attend in person!
Santa Clara Marriott: 2700 Mission College Boulevard · Santa Clara, California 95054 USA
A buffet dinner will be served.
To register to attend this event please visit the Churchill Club’s Website at : http://www.churchillclub.org/eventDetail.jsp?EVT_ID=892. There is a small charge for this event, however guests of ThoughtWorks may use a special discount, gtworks25, to receive the Churchill Club’s member rate.
View live-streamed event
The Real News Network: http://bit.ly/fdjoMr
Fora.tv: http://bit.ly/f4cuz6
January 13th, 2011
Named Scopes
One of my favorite features of Rails is named scopes. Maybe it’s because I’ve seen so much Rails code with conditions and finders spread all over the code base, but I feel that named scopes should be used almost always where you would find yourself otherwise writing a find_by or even using the conditions argument directly in consumer code. The basic rule I follow is to always use named scopes outside of models to select a specific subset of a model.
So what is a named scope? It’s really just an easier way of creating a custom finder method on your model, but it gives you some extra benefits I’ll talk about later. So if we have code like:
class Foo < ActiveRecord::Base def self.find_all_fluxes find(:all, :conditions => {:fluxes => true}) end end p Foo.find_all_fluxes
we can easily replace that with
class Foo < ActiveRecord::Base named_scope :find_all_fluxes, :conditions => {:fluxes => :true} end p Foo.find_all_fluxes
You can give named_scope any argument that find can take. So you can create named scopes specifically for ordering, specifically for including an association, etc.
The above example is fixed to always have the same conditions. But named_scope can also take arguments. You do that by instead of fixing the arguments, send in a lambda that will return the arguments to use:
def self.ordered_inbetween(from, to) find(:all, :conditions => {:order_date => from..to}) end Foo.ordered_inbetween(10.days.ago, Date.today)
can become:
named_scope :ordered_inbetween, lambda {|from, to| {:conditions => {:order_date => from..to}} } Foo.ordered_inbetween(10.days.ago, Date.today)
It’s important that you use the curly braces form of block for the lambda — if you happen to use do-end instead, you might not get the right result, since the block will bind to named_scope. The block that binds to named_scope is used to add extensions to the collections returned, and thus will not contribute to the arguments given to find. It’s a mistake I’ve done several times, and it’s very easy to make. So make sure to test your named scopes too!
So what is it that makes named scopes so useful? Several things. First, they are composable. Second, they establish an explicit interface to the model functionality. Third, you can use them on associations. There are other benefits too, but these are the main ones. Simply put, they are a cleaner solution than the alternatives.
What do I mean by composable? Well, you can call one on the result of calling another. So say that we have these named scopes:
class Person < ActiveRecord::Base named_scope :by_name, :order => "name ASC" named_scope :by_age, :order => "age DESC" named_scope :top_ten, :limit => 10 named_scope :from, lambda {|country| {:conditions => {:country => country}} } end
Then you can say:
Person.top_ten.by_name Person.top_ten.from("Germany") Person.top_ten.by_age.from("Germany")
I dare you to do that in a clean way by using class method finders.
I hope you have already seen what I mean by offering a clean interface. You can hide some of the implementation details inside your model, and your controller and view code will read more cleanly because of it.
The third point I made was about associations. Simply, if you have an association, you can use a named scope on that association, just as if it was the model class itself. So if we have:
class ApartmentBuilding < ActiveRecord::Base has_many :tenants, :class_name => "Person" end a = ApartmentBuilding.first a.tenants.from("Sweden").by_age
So named scopes are great, and you should use them. Whenever you sit down to write a finder, see if you can’t express it as a named scope instead.
(Note: some of the things I write about here only concerns Rails 2. Most of the work I do is still in the old world of 2.3.)