JtestR 0.3.1 Released


JtestR allows you to test your Java code with Ruby frameworks.

Homepage: http://jtestr.codehaus.org
Download: http://dist.codehaus.org/jtestr

JtestR 0.3.1 is the current release of the JtestR testing tool. JtestR integrates JRuby with several Ruby frameworks to allow painless testing of Java code, using RSpec, Test/Unit, Expectations, dust and Mocha.

Features:
– Integrates with Ant, Maven and JUnit
– Includes JRuby 1.1, Test/Unit, RSpec, Expectations, dust, Mocha and ActiveSupport
– Customizes Mocha so that mocking of any Java class is possible
– Background testing server for quick startup of tests
– Automatically runs your JUnit and TestNG codebase as part of the build

Getting started: http://jtestr.codehaus.org/Getting+Started

New in the 0.3.1 release is upgrade of JRuby to revision r7479 which includes several new Java Integration features, upgrading of ActiveSupport to 2.1.0, fixing a severe memory leak in the background server and some minor usability features.

New and fixed in this release:
JTESTR-50 Difference in functionality when stubbing a method on a Java class vs a Ruby class using mocha
JTESTR-51 Mocking of classes lacking default constructors results in a NameError
JTESTR-53 Push the JtestR JRuby builds to maven repos
JTESTR-56 Upgrade ActiveSupport
JTESTR-57 Make it possible to use local versions of libraries.
JTESTR-59 No output when no tests found.
JTESTR-60 OutOfMemoryError
JTESTR-61 Documentation improvments – ant test-server
JTESTR-62 Having the jtestr.jar in the base directory doesn’t work
JTESTR-63 Update JRuby version



Where is the Net::SSH bug


Yesterday I spent several hours trying to find the problem with our implementation of OpenSSL Cipher, that caused the Net::SSH gem to fail miserable during negotiation and password verification. After various false leads I finally found the reason for the strange behavior. But I really can’t decide if it’s a bug, and if it’s a bug where the bug is. Is it in Ruby’s interface to OpenSSL, or is it in Net::SSH?

No matter what cipher suite you use for SSH, you generally end up using a block cipher, mostly something like CBC. That means an IV (initialization vector) is needed, together with a key. The relevant parts of OpenSSL used is the EVP_CipherInit, EVP_CipherUpdate and EVP_CipherFinal family of methods. Nothing really strange there. The Ruby interface matches these methods quite closely; every time you set a key, or an IV, or some other parameter, the CipherInit method is called with the relevant data. When CipherUpdate is called, the actual enciphering or deciphering starts happening, and CipherFinal takes care of the final block.

At the point EVP_CipherFinal is called, nothing more should be done using the specific Cipher context. Specifically, no more Update operations should be used. The man page has this to say about the Final-methods:

After this function is called the encryption operation is finished and no further calls to EVP_EncryptUpdate() should be made.

Now, what I found was that same documentation is not part of the Ruby interface. And Net::SSH is actually reusing the same Cipher object after final has been called on it. Specifically, it continues the conversation, calling update a few times and then final. The general flow for a specific Cipher object in Net::SSH is basically init->update->update->final->update->update->final.

So what is so bad about this then? Well, the question is really this: what IV will the operations after the first final call be using? The assumption I made is that obviously it will use the original IV set on the object. Something else would seem absurd. But indeed, the IV used is actually the last IV-length bytes of encrypted data returned. Is this an obvious or intended effect at some level? Probably not, since the OpenSSL documentation says you shouldn’t do it. The reason it works that way is because the temporary buffer used in the Cipher context isn’t cleared out at the end of the call to final.

In contrast, the Java Cipher object will call reset() as part of the call to doFinal(). Where reset() will actually reset the internal buffers to use the original IV. So the solution is simple for encryption. Just save away 8 or 16 bytes of the last generated crypto text and set that manually as the IV after the call to doFinal. And what about decryption? Well, here the IV needs to be the last crypto text sent in for deciphering, not the result of the last operation.

So Net::SSH seems to work fine with JRuby now. I’m about to release a new version of JRuby-OpenSSL including these and many other things.

But the question remains. Is it a bug? If it is, is it in the Ruby OpenSSL integration, or in the Net::SSH usages of Ciphers? If it’s in the Net::SSH code, why does it actually work correctly when communicating with an SSH server? Or is this behavior of using the last crypto text as IV something documented in the SSH spec?

Enlightenment would be welcome.



Security vs Convenience


I really like Cryptogram and read every issue. It’s interesting stuff that talks a lot about how our minds work in conjunction with risk and reward. Today I had a typical example of how security versus convenience is a part of day to day life.

I had just checked out from my hotel, and wanted to store all my luggage (including my laptop bag) in the hotel until my ride out of town arrived. I asked about this, and it was fine, they had a room for this. The person in the reception pointed me to an open room and said it was open and that I could put my stuff there. Feeling uneasy I asked how secure it was, and she answered that the door was usually locked. OK, I said, but can someone take any bag from inside of there? Yes, was the answer. I decided I couldn’t store my stuff there. Even if the risk was small, losing my work laptop would be way to bad to risk. But I also decided I couldn’t drag my two heavy bags and laptop bag around.

I ended up putting the large bags in the room, and just taking my laptop bag around. I didn’t have as much to lose with the large bags, and the price of inconvenience in taking them along was just to high. These considerations go into everything we do in programming and systems engineering. A totally secure system is generally quite inconvenient to use, while an insecure system can be very pleasant to use. The trick is to get the balance right, I guess.



JtestR doesn’t start up.


Justin Smestad uncovered an issue with JtestR that can cause some quite unintuitive output, and be hard to debug. Some info can be found here: http://www.evalcode.com/2008/08/jtestr-woes/ and here: http://jira.codehaus.org/browse/JTESTR-62. The issue has been fixed on trunk, but hasn’t been released yet. The issue is very simple – just make sure you don’t have the jtestr.jar file in the base directory where your project lives (this is usually the same place as the build.xml file). There are two ways to achieve this, either move the file into a directory or rename the file to something else.



Java and mocking


I’ve just spent my first three days on a project in Leeds. It’s a pretty common Java project, RESTful services and some MVC screens. We have been using Mockito for testing which is a first for me. My immediate impression is quite good. It’s a nice tool and it allows some very clean testing of stuff that generally becomes quite messy. One of the things I like is how it uses generics and the static typing of Java to make it really easy to make mocks that are actually type checked; like this for example:

Iterator iter = mock(Iterator.class);stub(iter.hasNext()).toReturn(false);

// Call stuff that starts interaction
verify(iter).hasNext();

These are generally the only things you need to stub stuff out and verify that it was called. The things you don’t care about you don’t verify. This is pretty good for being Java, but there are some problems with it too. One of the first things I noticed I don’t like is that interactions that isn’t verified can’t be disallowed in an easy way. Optimally this would happen at the creation of the mock, instead of actually calling the verifyNoMoreInteractions() afterwards instead. It’s way to easy to forget. Another problem that quite often comes up is that you want to mock out or stub some methods but retain the original behavior of others. This doesn’t seem possible, and the alternative is to manually create a new subclass for this. Annoying.

Contrast this to testing the same interaction with Mocha, using JtestR, the difference isn’t that much, but there is some missing cruft:

iter = mock(Iterator)
iter.expects(:hasNext).returns(false)

# Call stuff that starts interaction

Ruby makes the checking of interactions happen automatically afterwards, and so you don’t have any types you don’t need to care about most stuff the way you do in Java. This also shows a few of the inconsistencies in Mockito, that is necessary because of the type system. For example, with the verify method you send the mock as argument and the return value of the verify-method is what you call the actual method on, to verify that it’s actually called. Verify is a generic method that returns the same type as the argument you give to it. But this doesn’t work for the stub method. Since it needs to return a value that you can call toReturn on, that means it can’t actually return the type of the mock, which in turn means that you need to call the method to stub before the actual stub call happens. This dichotomy gets me every time since it’s a core inconsistency in the way the library works.

Contrast that to how a Mockito like library might look for the same interaction:

iter = mock(Iterator)
stub(iter).hasNext.toReturn(false)

# Do stuff
verify(iter).hasNext

The lack of typing makes it possible to create a cleaner, more readable API. Of course, these interactions are all based on how the Java code looked. You could quite easily imagine a more free form DSL for mocking that is easier to read and write.

Conclusion? Mockito is nice, but Ruby mocking is definitely nicer. I’m wondering why the current mocking approaches doesn’t use the method call way of defining expectations and stubs though, since these are much easier to work with in Ruby.

Also, it was kinda annoying to upgrade from Mockito 1.3 to 1.4 and see half our tests starting to fail for unknown reasons. Upgrade cancelled.



JtestR, RubyGems, and external code


One question I’ve gotten a few times now that people are starting to use JtestR, is how to make it work with external libraries. This is actually two different questions, masquerading as one. The first one regard the libraries that are already included with JtestR, such as JRuby, RSpec or ActiveSupport. There is an open bug in JIRA for this, called JTESTR-57, but the reason I’ve been a bit hesitant to add this functionality until now, is because JtestR actually does some pretty hairy things in places. Especially the JRuby integration does ClassLoader magic that can potentially be quite version dependent. The RSpec and Mocha integration is the same. I don’t actually modify these libraries, but the code using them is a bit brittle at the moment. I’ve worked on fixing this by providing patches to the framework maintainers to include the hook functionality I need. This has worked with great success for both Expectations and RSpec.

That said, I will provide something that allows you to use local versions of these libraries, at your own risk. It will probably be part of 0.4, and if you’re interested JTESTR-57 is the one to follow.

The second problem is a bit more complicated. You will have seen this problem if you try to do “require ‘rubygems'”. JtestR does not include RubyGems. There are both tecnnical and non-technical reasons for this. Simply, the technical problem is that RubyGems is coded in such a way that it doesn’t interact well with loading things from JAR-packaged files. That means I can’t distribute the full JtestR in one JAR-file if I wanted RubyGems, and that’s just unacceptable. I need to be able to bundle everything in a way that makes it easy to use.

The non-technical reason is a bit more subtle. If RubyGems can be used in your tests, it encourages locally installed gems. It’s a bit less pain to do it that way initially, but remember that as soon as you check the tests in to version control (you are using version control, right?) it will break in unexpected ways if other persons using the code doesn’t have the same gems installed, with the same versions.

Luckily, it’s quite simple to work provide functionality to JtestR, even if no gems are used. The first step is to create a directory that contains all the third party code. I will call it test_lib and place it in the root of the project. After you have done that you must first unpack your gems:

mkdir test_lib
cd test_lib
jruby -S gem unpack activerecord

When you have the gems you want unpacked in this directory, you can add something like this to your jtestr_config.rb:

Dir["test_lib/*/lib"].each do |dir|
  $LOAD_PATH << dir
end

And finally you can load the libraries you need:

require 'active_record'


Testing programming language implementations


While writing the post yesterday about testing regular expressions, I realized that this problem is not really specific to regular expressions. I got a very good comment noting that testing any place that uses some kind of DSL is definitely prudent. SQL is another example.

But these examples are both about actually testing the usage of them, and the problem becomes that you have two languages, but you’re mostly only testing the code written in the outer language. This is due to several reasons. One of the most obvious ones is that our tools really doesn’t make it that easy to do.

Thinking about these issues made me start thinking about how we generally test languages. Having worked on several language implementations and worked on both new languages, and implementations of existing languages, I’ve come to the conclusion that the whole area of testing languages are actually quite complicated, and also there are no real best practices for doing it.

First, there is a problem of terminology. Many implementations of languages that are really executable specifications of how the language should work. What’s the difference? Well, testing the language according to such a spec, you are really only doing functional, black-box testing. I’ve looked at several of the open source language implementations, and I don’t really see much usage of anything else than such language spec tests. This means basically that some parts of the implementation can be implemented wrongly, and by some freak chance it still works correctly in all the cases you have tests for, but it might fail in other ways.

Unit tests for the actual implementation would help with this – it helps since you will be doing TDD on the unit level, it helps because you make a conscious decision about the implementation and what it should be doing in these cases. It still doesn’t make everything clear cut and simple, but it absolutely would help. So why don’t most implementations do unit testing of the internals? I don’t really know. Maybe it’s because implementations can be extremely complicated. But that should be a reason for testing more, not testing less. One reason I feel a bit about is that it makes larger changes quite hard. Large refactorings are one of the ways JRuby has used to get incredible performance improvements and new subsystems, but unit tests can sometimes act as inertia for these.

I’m totally disregarding the academic approaches here. Yeah, in soem cases for simple languages, you can actually prove that it does what you want it to do, and for small enough implementations using a suitable language, you can actually prove the same things about the implementation. The problem is that this approach doesn’t scale.

And since a language almost always is turing complete, that means that you can’t exhaustively test it. There is no way of testing all permutations – either manually or automatically. So what should a language spec do? The first thing that many languages do are to specify that whole areas of functionality result in undefined behavior. That makes it easier. But the real problems exist when you start combining different features which can interact in different ways.

At the end of the day, I have no idea how to actually do this well. I would like to know though – how should I test the implementation, and how should I write an executable language specification? And these questions doesn’t even touch on the question of testing the core libraries. Many of the some problems apply, but it gets even more complicated.



Local things in Emacs


This is just a small note, since this have bugged me for a while. Basically, I have lots of extra key bindings running around in my Emacs configuration. Now, I use local-set-key for many of these. The problem is I hadn’t actually read the documentation for local-set-key enough.

One example that annoyed me was this: I had some local key bindings for RSpec buffers, that differed from the regular Ruby buffers. My RSpec minor mode still uses the ruby-mode-map though. My assumption was that local-set-key did things exactly as all other things with “local” in their name, namely doing a buffer local modification only. I finally found out that this wasn’t the case. Instead, when the RSpec minor mode was loaded for the first time, it ended up modifying the ruby-mode-map with its key bindings, which were then visible for all other Ruby buffers. Ouch.

So, if you use local-set-key, make sure you actually want to set that key in the current mode map, instead of only for the current buffer.

As far as I know, there is no way to set a real buffer local key binding without some acrobatics that unsets and resets the keys manually. I ended up solving my problem with the RSpec minor mode to having it clone the Ruby mode map and have its own mode map. Not an ideal solution, but it works for now.



Testing Regular Expressions


Something has been worrying me a bit lately. Being test infected and all, and working for ThoughtWorks, where testing is part of the life blood, I think more and more about these issues. And one thing I’ve started noticing is that regular expressions seems to be a total blind spot in many cases. I first started thinking about it when I changed a quite complicated regular expression in RSpec. Now RSpec has coverage tests as part of their build, and if the test coverage is less than a 100%, the build will fail. Now, since I had changed something to add new functionality, but hadn’t added any tests for it, I instinctively assumed that it would be caught be the coverage tool.

Guess what? It wasn’t. Of course, if I had changed the regexp to do something that the surrounding code couldn’t support, one of the tests for surrounding lines of code would have caught it, but I got no mention from the coverage tool that I needed more tests to fully handle the regular expressions. This is logical if you think about it. There is no way that a coverage tool could find all the regular expressions in your source code, and then make sure that all branches and alternatives of that particular regular expression was exercised. So that means that the coverage tool doesn’t do anything with them at all.

OK, I can live with that, but it’s still one of those points that would be very good to keep in mind. Every time you write a regular expression in your code, you need to take special care to actually exercise that part of the code with many inputs. What is many in this case? That’s another part of the problem – it depends on the regular expression. It depends on how complicated it is, how long it is, how many special operators are used, and so on. There is no real way around it. To test a regular expression, you really need to understand how they work. The corollary is obvious – to use a regular expression in your code, you need to know how to test it. Conclusion – you need to understand regular expressions.

In many code bases I haven’t seen any tests for regular expressions at all. In most cases these have been crafted by writing them outside the code, testing them by hand, and then putting them in the code. This is brittle to say the least. In the cases where there are tests, it’s much more common that they only test positives, and not negatives. And I’ve seldom heard of code bases with enough tests for regular expressions. One of the problems is that in a language like Ruby, they are so easy to use, so you stick them in all over the place. A standard refactoring could help here, by extracting all literal regular expressions to constants. But then the problem becomes another – as soon as you use regular expressions to extract values from a string, it’s a pain to not have the regular expression at the same place as the extracted groups are used. Example:

PhoneRegexp = /(\d{3})-?(\d{4})-?(\d{4})/
# 200 lines of code
if phone_number =~ PhoneRegexp
  puts "phone number is: #$1-#$2-#$3"
end

If the regular expression had been at the same place as the usage of the $1, $2 and $3 it would have been easy to tie them to the parts of the string. In this case it would be easy anyway, but in more complicated cases it’s more complicated. The solution to this is easy – the dollar numbers are evil: don’t use them. Instead use an idiom like this:

area, number, extension = PhoneRegexp.match(phone_number).captures

In Ruby 1.9 you will be able to use named captures, and that will make it even easier to make readable usage of the extracted parts of a string. But fact is, the difference between the usage point and the definition point can still cause trouble. A way of getting around this would be to take any complicated regular expression and putting it inside of a specific class for only that purpose. The class would then encapsulate the usage, and would also allow you to test the regular expression more or less in isolation. In the example above, maybe creating a PhoneNumberParser would be a good idea.

At the end of the day, regular expressions are an extremely complicated feature, and in general we don’t test the usage of them enough. So you should start. Begin by first creating both positive and negative tests for them. Figure out the boundaries, and see where they can go wrong. Know regular expressions well enough to know what happens in these strange circumstances. Think about unicode characters. Think about whitespace. Think about greedy and lazy matching. As an example of something that took a long time to cause trouble; what’s wrong with this regexp that tries to discern if a string is a select statement or not?

/^\s*\(*\s*SELECT\W+/i

And this example actually covers most of the ground, already. It checks case insensitive. It checks for white space before any optional parenthesis, and for any white space after. It makes sure that the word SELECT isn’t continued by checking for at least one non word character. So what’s wrong with it? Well… It’s the caret. Imagine if we had a string like this:

"INSERT INTO foo(a,b,c)\nSELECT * FROM bar"

The regular expression will in fact match this, even though it’s not a select statement. Why? Well, it just so happens that the caret matches the beginning of lines, not the beginning of strings. The dollar sign works the same way, matching the end of lines. How do you solve it? Change the caret to \A and the dollar sign to \Z and it will work as expected. A similar problem can show up with the “.” to match any character. Depending on which language you are using, the dot might or might not match a newline. Always make sure you know which one you want, and what you don’t want.

Finally, these are just some thoughts I had while writing it. There is much more advice to give, but it can be condensed to this: understand regular expressions, and test them. The dot isn’t as simple as it seem. Regular expressions are a full blown language, even though it’s not turing complete (in most implementations). That means that you can’t test it completely, in the general case. This doesn’t mean you shouldn’t try to cover all eventualities.

How are you testing your regular expressions? How much?



Applications and libraries


In a recent discussion around one of Steve Yegge’s blog post, an incidental remark was that it’s OK that a language makes it harder for a library creator than for an application developer. This point was made by David Pollak and Martin Odersky in relation to some of the complications that you need to handle when creating a Scala library that you can intuitively use without a full understanding of the Scala type system. Make no mistake, I have lots of respect for both Martin and David, it’s just that in this case I think it’s actually a quite damaging assumption to make. And they are not the only ones who reason like that either. Joshua Bloch’s book Effective Java includes this assumption too, in many places.

So what’s wrong with it then? Isn’t there a difference between developing an application and a library. Yes, there is a difference, but it’s definitely not as large as people make it out to be. And even more importantly: it _shouldn’t_ be that much of a difference. The argument from David was that when creating a library in Scala, he needs to focus and work with quite complicated parts of the type system so that the consumer gets a nice API to use the library through. This process is much harder than just using the library would be.

Effective Java contains much good advice, but most of them are from the perspective of someone who creates libraries for a living, and there are a few places where Josh explicitly says that his advice isn’t necessarily applicable when writing an application, since he doesn’t have that point of view.

Let’s take a look at a fundamental question then. What is actually a library, and what is an application? In my opinion, a library is a module providing functionality of some kind, restricted to a specific domain. This can be a horizontal or vertical domain, that doesn’t matter, but it’s usually something that is usable in more than one circumstance. It’s not uncommon that libraries use other libraries to implements its functionality. An application is usually a collection of libraries that provide functionality to an end user. That end user can be either a person, a program or another computer – that doesn’t matter. But wait, isn’t libraries usually also created to provide functionality to other pieces of code? And even though libraries have a tendency to contain more specific code, and less usage of other libraries, the line is extremely fuzzy.

The way most applications seems to be built now, most of the work is done to collect libraries, provide the missing functionality and glue them together in some way. But that doesn’t mean that the code you write in the application won’t be used as a library by another consumer. In fact, it’s more and more common to try to reuse as much as possible, and especially when you extend an existing application, it’s extremely important that you can consume the existing functionality in a sane way.

So why make the distinction? Doing that seems to me to be an excuse for writing bad code if it’s in an application. Why won’t we as programmers admit that we don’t know if someone else will need to consume the code later, and write the best code we can, including creating usable ad well thought out public APIs? Yes, the cost and time will be higher, but that’s true for writing tests too. I don’t see any value in arguing that libraries should be designed with more care than application code. In fact, I think that attitude is actively detrimental to the industry. And adding a language feature to a language that is complicated, and then arguing that only “library developers” will need to understand it is definitely not the right way to go. A responsible developer using a language needs to understand how that language works. Otherwise that developer will sooner or later cause a great mess. It’s just a matter of time.