Stu’s Java.next series


If you haven’t already seen this, let me totally recommend Stuart Halloway’s series on the languages he call Java.next. The series look at several different aspects of these languages (Groovy, Scala, Clojure, JRuby), contrast them with each other and Java. Highly recommended if you are in any way interested in the languages that will soon replace Java for much application development.

The three published parts are:

  1. Java.next: Common Ground
  2. Java.next #2: Java Interop
  3. Java.next #3: Dispatch


Redefinition


Since I’ve just started out this new blog, I felt that it’s time to reinvigorate myself and also have a better plan what I want to talk about in this venue. Lately it’s been far between the posts, and I’ve mostly just written without any kind of plan. That’s supposed to change.

I do have ideas for what I want to write about. Specifically, I have three different projects that I’m in different stages with. All of these will generate lots of content as soon as I release any of them. So those I’m definitely going to write about. This is going to be lots of language oriented stuff. Both the DSL variety and the full general purpose language variety.

But until then I need your input. I’d like to know what kind of posts you want to see more of, what kind of subjects interest you the most, and so on. I’m quite versatile and touch on lots of subjects in my daily life, so it’s more or less up to you. If I don’t get any input I’ll just write whatever I feel like. As an example of things I could definitely focus more on, here’s a list of subjects that lie close to my heart:

  • Ruby metaprogramming
  • Ruby nooks and crannies
  • Anything regarding JRuby
  • Security with Ruby
  • Security with Java
  • Artificial Intelligence
  • Programming languages in general, type theory and things like that
  • DSLs

This list is just a sample, though. Give me your opinions in the comments. If I don’t get any comments I’ll either think no one is interested, or just keep on writing whatever I want. =



First post


I am very happy to write the first post on my new blog. Of course, it’s got the same name as the old one, the same author, and the same history. But it’s still a new blog – just look at the layout! It’s completely different!

I chose WordPress as my blogging system. It was the solution that seemed the be the most fully featured, without being too complicated to handle. So if you’re reading this, you have already found this place. Remember to add the feed to your reader. Oh, and the feed from wordpress automatically redirects to feedburner, just so you know.

The plan is to add some other kinds of content to olabini.com, as soon as I have some time to set up the initial structure. I’ll probably talk about it more when it happens.

The blog is dead, long live the blog.



Java and mocking


I’ve just spent my first three days on a project in Leeds. It’s a pretty common Java project, RESTful services and some MVC screens. We have been using Mockito for testing which is a first for me. My immediate impression is quite good. It’s a nice tool and it allows some very clean testing of stuff that generally becomes quite messy. One of the things I like is how it uses generics and the static typing of Java to make it really easy to make mocks that are actually type checked; like this for example:

Iterator iter = mock(Iterator.class);stub(iter.hasNext()).toReturn(false);

// Call stuff that starts interaction
verify(iter).hasNext();

These are generally the only things you need to stub stuff out and verify that it was called. The things you don’t care about you don’t verify. This is pretty good for being Java, but there are some problems with it too. One of the first things I noticed I don’t like is that interactions that isn’t verified can’t be disallowed in an easy way. Optimally this would happen at the creation of the mock, instead of actually calling the verifyNoMoreInteractions() afterwards instead. It’s way to easy to forget. Another problem that quite often comes up is that you want to mock out or stub some methods but retain the original behavior of others. This doesn’t seem possible, and the alternative is to manually create a new subclass for this. Annoying.

Contrast this to testing the same interaction with Mocha, using JtestR, the difference isn’t that much, but there is some missing cruft:

iter = mock(Iterator)
iter.expects(:hasNext).returns(false)

# Call stuff that starts interaction

Ruby makes the checking of interactions happen automatically afterwards, and so you don’t have any types you don’t need to care about most stuff the way you do in Java. This also shows a few of the inconsistencies in Mockito, that is necessary because of the type system. For example, with the verify method you send the mock as argument and the return value of the verify-method is what you call the actual method on, to verify that it’s actually called. Verify is a generic method that returns the same type as the argument you give to it. But this doesn’t work for the stub method. Since it needs to return a value that you can call toReturn on, that means it can’t actually return the type of the mock, which in turn means that you need to call the method to stub before the actual stub call happens. This dichotomy gets me every time since it’s a core inconsistency in the way the library works.

Contrast that to how a Mockito like library might look for the same interaction:

iter = mock(Iterator)
stub(iter).hasNext.toReturn(false)

# Do stuff
verify(iter).hasNext

The lack of typing makes it possible to create a cleaner, more readable API. Of course, these interactions are all based on how the Java code looked. You could quite easily imagine a more free form DSL for mocking that is easier to read and write.

Conclusion? Mockito is nice, but Ruby mocking is definitely nicer. I’m wondering why the current mocking approaches doesn’t use the method call way of defining expectations and stubs though, since these are much easier to work with in Ruby.

Also, it was kinda annoying to upgrade from Mockito 1.3 to 1.4 and see half our tests starting to fail for unknown reasons. Upgrade cancelled.



JtestR, RubyGems, and external code


One question I’ve gotten a few times now that people are starting to use JtestR, is how to make it work with external libraries. This is actually two different questions, masquerading as one. The first one regard the libraries that are already included with JtestR, such as JRuby, RSpec or ActiveSupport. There is an open bug in JIRA for this, called JTESTR-57, but the reason I’ve been a bit hesitant to add this functionality until now, is because JtestR actually does some pretty hairy things in places. Especially the JRuby integration does ClassLoader magic that can potentially be quite version dependent. The RSpec and Mocha integration is the same. I don’t actually modify these libraries, but the code using them is a bit brittle at the moment. I’ve worked on fixing this by providing patches to the framework maintainers to include the hook functionality I need. This has worked with great success for both Expectations and RSpec.

That said, I will provide something that allows you to use local versions of these libraries, at your own risk. It will probably be part of 0.4, and if you’re interested JTESTR-57 is the one to follow.

The second problem is a bit more complicated. You will have seen this problem if you try to do “require ‘rubygems'”. JtestR does not include RubyGems. There are both tecnnical and non-technical reasons for this. Simply, the technical problem is that RubyGems is coded in such a way that it doesn’t interact well with loading things from JAR-packaged files. That means I can’t distribute the full JtestR in one JAR-file if I wanted RubyGems, and that’s just unacceptable. I need to be able to bundle everything in a way that makes it easy to use.

The non-technical reason is a bit more subtle. If RubyGems can be used in your tests, it encourages locally installed gems. It’s a bit less pain to do it that way initially, but remember that as soon as you check the tests in to version control (you are using version control, right?) it will break in unexpected ways if other persons using the code doesn’t have the same gems installed, with the same versions.

Luckily, it’s quite simple to work provide functionality to JtestR, even if no gems are used. The first step is to create a directory that contains all the third party code. I will call it test_lib and place it in the root of the project. After you have done that you must first unpack your gems:

mkdir test_lib
cd test_lib
jruby -S gem unpack activerecord

When you have the gems you want unpacked in this directory, you can add something like this to your jtestr_config.rb:

Dir["test_lib/*/lib"].each do |dir|
  $LOAD_PATH << dir
end

And finally you can load the libraries you need:

require 'active_record'


Testing Regular Expressions


Something has been worrying me a bit lately. Being test infected and all, and working for ThoughtWorks, where testing is part of the life blood, I think more and more about these issues. And one thing I’ve started noticing is that regular expressions seems to be a total blind spot in many cases. I first started thinking about it when I changed a quite complicated regular expression in RSpec. Now RSpec has coverage tests as part of their build, and if the test coverage is less than a 100%, the build will fail. Now, since I had changed something to add new functionality, but hadn’t added any tests for it, I instinctively assumed that it would be caught be the coverage tool.

Guess what? It wasn’t. Of course, if I had changed the regexp to do something that the surrounding code couldn’t support, one of the tests for surrounding lines of code would have caught it, but I got no mention from the coverage tool that I needed more tests to fully handle the regular expressions. This is logical if you think about it. There is no way that a coverage tool could find all the regular expressions in your source code, and then make sure that all branches and alternatives of that particular regular expression was exercised. So that means that the coverage tool doesn’t do anything with them at all.

OK, I can live with that, but it’s still one of those points that would be very good to keep in mind. Every time you write a regular expression in your code, you need to take special care to actually exercise that part of the code with many inputs. What is many in this case? That’s another part of the problem – it depends on the regular expression. It depends on how complicated it is, how long it is, how many special operators are used, and so on. There is no real way around it. To test a regular expression, you really need to understand how they work. The corollary is obvious – to use a regular expression in your code, you need to know how to test it. Conclusion – you need to understand regular expressions.

In many code bases I haven’t seen any tests for regular expressions at all. In most cases these have been crafted by writing them outside the code, testing them by hand, and then putting them in the code. This is brittle to say the least. In the cases where there are tests, it’s much more common that they only test positives, and not negatives. And I’ve seldom heard of code bases with enough tests for regular expressions. One of the problems is that in a language like Ruby, they are so easy to use, so you stick them in all over the place. A standard refactoring could help here, by extracting all literal regular expressions to constants. But then the problem becomes another – as soon as you use regular expressions to extract values from a string, it’s a pain to not have the regular expression at the same place as the extracted groups are used. Example:

PhoneRegexp = /(\d{3})-?(\d{4})-?(\d{4})/
# 200 lines of code
if phone_number =~ PhoneRegexp
  puts "phone number is: #$1-#$2-#$3"
end

If the regular expression had been at the same place as the usage of the $1, $2 and $3 it would have been easy to tie them to the parts of the string. In this case it would be easy anyway, but in more complicated cases it’s more complicated. The solution to this is easy – the dollar numbers are evil: don’t use them. Instead use an idiom like this:

area, number, extension = PhoneRegexp.match(phone_number).captures

In Ruby 1.9 you will be able to use named captures, and that will make it even easier to make readable usage of the extracted parts of a string. But fact is, the difference between the usage point and the definition point can still cause trouble. A way of getting around this would be to take any complicated regular expression and putting it inside of a specific class for only that purpose. The class would then encapsulate the usage, and would also allow you to test the regular expression more or less in isolation. In the example above, maybe creating a PhoneNumberParser would be a good idea.

At the end of the day, regular expressions are an extremely complicated feature, and in general we don’t test the usage of them enough. So you should start. Begin by first creating both positive and negative tests for them. Figure out the boundaries, and see where they can go wrong. Know regular expressions well enough to know what happens in these strange circumstances. Think about unicode characters. Think about whitespace. Think about greedy and lazy matching. As an example of something that took a long time to cause trouble; what’s wrong with this regexp that tries to discern if a string is a select statement or not?

/^\s*\(*\s*SELECT\W+/i

And this example actually covers most of the ground, already. It checks case insensitive. It checks for white space before any optional parenthesis, and for any white space after. It makes sure that the word SELECT isn’t continued by checking for at least one non word character. So what’s wrong with it? Well… It’s the caret. Imagine if we had a string like this:

"INSERT INTO foo(a,b,c)\nSELECT * FROM bar"

The regular expression will in fact match this, even though it’s not a select statement. Why? Well, it just so happens that the caret matches the beginning of lines, not the beginning of strings. The dollar sign works the same way, matching the end of lines. How do you solve it? Change the caret to \A and the dollar sign to \Z and it will work as expected. A similar problem can show up with the “.” to match any character. Depending on which language you are using, the dot might or might not match a newline. Always make sure you know which one you want, and what you don’t want.

Finally, these are just some thoughts I had while writing it. There is much more advice to give, but it can be condensed to this: understand regular expressions, and test them. The dot isn’t as simple as it seem. Regular expressions are a full blown language, even though it’s not turing complete (in most implementations). That means that you can’t test it completely, in the general case. This doesn’t mean you shouldn’t try to cover all eventualities.

How are you testing your regular expressions? How much?



A Personal blog


Due to lots and lots and lots of stuff happening in my life right now, I haven’t had the time to write anything substantial here in a while. Hopefully, this will change.

I have also decided to start a new blog, for more… personal musings. Books, music, movies. Stuff like that. Nothing that would interest the regular readers of this blog, probably. But anyway, here’s the address: http://olabini.blogspot.com.

Ah, right: Have a merry christmas and a happy new year, everyone!



The difference between Kernel#` and Kernel#system


Today I had a fun learning experience. It cost me several hours of work, so I will post a small notice about it here so Google can make other developers lives easier. Or maybe I’m the only one who did this mistake.

Anyway. What I was trying to do was to start an external Ruby script (from another Ruby script). This other Ruby script went daemon, but since I didn’t want to install the daemonize package (another bad decision, probably), I just wrote the script in question to fork and detach. Now, I have condensed the question a little, to this Ruby script:

 `ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'`

Everyone please raise their hands if it is obvious that this script will sleep for 5 seconds before giving back my prompt. It wasn’t obvious for me, since it was a long time I did UNIX System programming.

For those who still want to know, the problem is that backtick binds to the started process’ STDIN, STDOUT and STDERR. As long as STDOUT is live, backtick will wait. And since the forking and detaching doesn’t redirect all the STD* streams, this will wait until both processes has finished.

There are two ways to fix this. One right way, and one fast way. The right way is to detach the rebind the streams after forking. This can easily be done with this code:

 STDIN.reopen('/dev/null')
STDOUT.reopen('/dev/null')
STDERR.reopen('/dev/null')

The faster way is to replace backtick with a system call. Since system isn’t interested in the output from the process, it will not bind those streams. So just running this instead, will work:

 system "ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'"

I have learned the lesson. I have bought a copy of the Stevens book. (UNIX Network Programming, which detail the interaction between fork and ports, which was what my original problem was about.)



Dynamic Ruby power and static balance


Update: This post has been updated to explain, clarify and remove certain things that sounded like an attack on people that didn’t agree with me, especially Austin. This was certainly not my intent when writing it. Added explanations will be highlighted with italic text.

Sir Bedevere: And what do you burn, apart from witches?
Peasant 1: More witches.
Peasant 2: Wood.
Sir Bedevere: Good. Now, why do witches burn?
Peasant 3: …because they’re made of… wood?
Sir Bedevere: Good. So how do you tell whether she is made of wood?
Peasant 1: Build a bridge out of her.
Sir Bedevere: But can you not also build bridges out of stone?
Peasant 1: Oh yeah.
Sir Bedevere: Does wood sink in water?
Peasant 1: No, no, it floats!… It floats! Throw her into the pond!
Sir Bedevere: No, no. What else floats in water?
Peasant 1: Bread.
Peasant 2: Apples.
Peasant 3: Very small rocks.
Peasant 1: Cider.
Peasant 2: Gravy.
Peasant 3: Cherries.
Peasant 1: Mud.
Peasant 2: Churches.
Peasant 3: Lead! Lead!
King Arthur: A Duck.
Sir Bedevere: …Exactly. So, logically…
Peasant 1: If she weighed the same as a duck… she’s made of wood.
Sir Bedevere: And therefore…
Peasant 2: …A witch!
(quotes from Monty Python and the Holy Grail, courtesy of IMDB)

My post announcing Ducktator seems to have stirred up a few emotions on Ruby-talk. Of course, most of this is my fault, by naming the library in such a frivolous way and not explaining the domains for its usage correctly. But on the other hand, there seems to be a general confusion about the concept of Duck typing, dynamic versus static typing, validation and other issues. Actually, I get a whiff of religion when my mention of Duck typing engendered such a diverse set of responses.

Of course, my reaction about duck typing was as religious. I see this is a general trap when discussing programming languages. The Ruby community is altogether very good at avoiding religion, which caused me to be quite startled when I found hints of it. Duck typing as a concept seem to be very loaded right now. I’m merely pointing this out as something that we should take care to be on the watchout for. Just as I will do from now on, I suggest people in the Ruby community should try to be as objective as possible, when discussing this.

And everyone and their aunt seem to have different opinions on what duck typing really is. It’s all quite fun, actually, except for the fact that it misses the point. I should have avoid mentioning ducks. I should have avoiding saying anyting at all about typing, since that isn’t the point. And I bloody well shouldn’t have used the class-validator in my example. Well, done is done. And this post won’t be about that. Just the next paragraph.

The Ducktator disclaimer

I won’t mention the words duck typing from here on. I would change the name of the project if it wasn’t so damn hard in RubyForge. But what I want to explain is this. Ducktator is about validating things. But not everywhere. You shouldn’t use Ducktator at those places where you have one or two checks for something in an object. You should really only use it at the borders of your code. The borders where you you will receive complex objects. Really complex objects where a method_missing won’t tell anyone anything useful at all. The use case I had in mind when writing the library was for RubyGems, when the YAML spec for a Gem has been loaded, to check that the important parts actually have what it takes to get into the source index. Since I managed to break RubyGems this way, I feel that this kind of validation can be really important. Once again, this is validation of live Ruby objects. Nothing else. You can check practically anything you want, but the easiest examples have been about each, class and respond_to. Hope this clarifies things a bit.

I removed the entire paragraph about typing. But my recommendation still stands; if you find formal types in programming languages interesting and/or confusing, read Programming Language Pragmatics, and you will be enlightened.

The main point

The reaction to my possibly improper use of the term Duck typing engendered a very strange response, which I hadn’t expected. Of course, I realize that this is a very obvious community effect. Since Duck typing is one of the trademarks of the Ruby community, it also means everyone has opinions on it, and more importantly feel the need to defend it as soon as some threat is perceived. Steve Yegge has written lots and lots about what language religion is really about, and I feel that this is an extension of that issue, so I won’t write more about it here either. You can find more in many of this excellent Drunken Blog Rants.

Finally. Balance is what I’m after. One person (Austin) said that the d**k t****g philosophy (I had written the word ‘issue’ here. That seems to have been misinterpreted. I blame that on my poor grasp of English, since my mother tounge is Swedish. =) is about TRUST. That you should trust the caller of your library to read your documentation (which – obviously – is perfect), and supply the correct objects. This isn’t too much to task if your docs are up to notch. And if the caller is the same one that will suffer if he mishandles your library. But trust isn’t enough when you’re at the borders. When talking to other languages through shaky serialization systems. When talking with clients that possibly could be hostile. (Yes, in this case setting $SAFE helps, but it doesn’t go all the way). (Sandbox is – or will be – a good alternative here, but I still see places where object validation is a better solution.)

Further, Austin responded in his blog post that he thinks I have ‘set up a false dichotomy here: people who are for duck typing as trusting your caller are against validation’. This wasn’t my intention. Actually more the other way around. I am for duck typing, in most places. What I’m saying is that no solution is perfect at all points in your code and duck typing is good fit in many, but not all. Further, the next paragraph clarifies my wish for balance.

What I’m saying is, most of the time you won’t need it, but in some cases, some kind of interface validation really helps a lot. I know the so called dynamic community doesn’t like to hear this. But what is so dynamic about failing without control? (The arguments I heard about letting code fail when the method isn’t there sounded very much to me like failing without control. That was my interpretation of the argument that you don’t need to use respond_to? for duck typing.) I know that I, as a developer isn’t infallible. I make mistakes. Most of the times I am in control of all my objects, but there are times when I’m not. For example, there are situations where I develop smaller applications for other (non-programmers) people. I like to create configurations and rules in YAML for these projects and leave the client in charge of configuring the application. But, what if he/she/it makes a mistake? Using the ‘other’ way, I would fail when trying to call protocol on something that should have been an URI, but wasn’t because the person made a typo and put an illegal character inside the URL. Will that message help the person doing the configuration? Should you wrap your calls in rescue’s all over the place and give the same explanation? Should you trust that the (non-programming) client should be able to read your RDoc and figure out that a method (which I bet you didn’t name get_uri_from_yaml_configuration) failed because of something they did in the configuration? I believe not.

What I’m really ranting about is balance. There needs to be a balance between checking and laissezfaire. In most places, just calling the method is fine. In other places it’s appropriate to check with respond_to?, in some cases you need to check the class. We’re programmers. We are supposed be good at judging which technique to use where. Yes, Ruby is dynamic language. Yes, Ruby is very easy to learn. Yes, Ruby makes most stuff very easy on you. That doesn’t mean you should stop thinking. It doesn’t mean you should be lazy. We are programmers, and we should be able to adapt.

One more time. Balance. Balance. Everywhere. And I do love the Ruby community. It is the best. Even though people get mad at each other, we can solve our differences. I’m proud of being a part of it.



MySQL, some concrete suggestions!


After my post Rails, Databases, ActiveRecord and the path towards damnation, I got an e-mail from rten Mickos, the CEO of MySQL. He asked me to provide concrete suggestions on how to improve MySQL (since the other post just contained some unspecified not-like vibes), so that’s the rationale for this post. I’m going to point at a few things I see as a problem for using MySQL as a production database right now. Standard disclaimer stands: these are my opinions, my own only, and my employer doesn’t necessarily agree or disagree with them on any level.

Let us jump into the fray:

  • Sequences. I would like real, nice and sweet sequences. I really don’t like to have no control of my primary key generation, and I especially don’t like that I can’t have sequences for anything else. The recommended solution according to the manual is to create a table with one auto-increment column in it, and use this as a sequence. That’s not acceptable, especially since I cannot tie this so-called sequence to the generation of id’s on other tables with subselects and other fun things.
  • OK, I really don’t like the auto-increment feature. Why not provide an IDENTITY keyword like the non-core feature ID T174+T175 specifies?
  • Real, honest-to-god, boolean types. Real ones. Not tinyint(1)s. Not enums. Not tinyint’s hidden behind the word boolean (like JDBC). Real boolean types.
  • I would like table1 and Table1 to be different (as per the spec). Oh yes, we seem to live in an insensitive world (case and otherwise) with Windows all over the place. But in my database I want that kind of control.
  • Limiting the return values of result sets. Now, I have no problem with LIMIT and friends, but since there is a spec, and that spec has a feature for this functionality too (T611), why can’t that be in MySQL?
  • Time-types should be able to store fractional seconds and time zones.
  • And what’s the matter with the TIMESTAMP type? That doesn’t really do what the standard says it should do. Please give it a name not in the standard.
  • And for Pete’s sake, double bars is for concatenation in SQL. || is for ‘or’ in programming, but SQL is a DSL. This screams leaky abstractions and is very annoying.
  • Stability of 5.0 features. I know triggers, foreign keys and stored procedures are all there now. But frankly, I don’t trust my referential integrity with them yet. Not from a database vendor that a few years ago wrote in their manual that the only reason for foreign keys was to be able to let GUI’s diagram relationships between database objects. Not from a vendor that said that you don’t need transactions to ensure data integrity. All in all, I want these features to be around a few hours, get the bugs hashed out, let them be pounded on for a while. But that’s not going to happen if people move to Rails, since Rails doesn’t believe in data integrity or foreign keys.

Well, that’s that. Only my opinions, remember? Anyway, for small and fast development, MySQL is really useful. I’m just arguing that a big production system should choose something else.