Know your Regular Expression anchors


As everyone knows, regular expressions are incredibly important in many programming tasks. So it pays to know some of the particulars of the regexp syntax. One example that bit me a while back was a simple oversight – something I did know but hadn’t kept in mind while writing the bad code. Namely, the way the caret (^) works when used in a String with newlines in it. To be fair I’ve been using Java regexps for a while and that problem doesn’t exist there.

To illustrate the difference, here is a program you can run in either MRI or JRuby. If running in JRuby you’ll see that the Java version needs the flag MULTILINE to behave as Ruby does by default.

str = "one\nover\nyou"
puts "Match with ^"
str.gsub(/^o/) do |e|
p $~.offset(0)
e
end

puts "Match with \\A"
str.gsub(/\Ao/) do |e|
p $~.offset(0)
e
end


if defined?(JRUBY_VERSION)
require 'java'
regexp = java.util.regex.Pattern.compile("^o", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with ^"
while matcher.find()
p matcher
end

regexp = java.util.regex.Pattern.compile("\\Ao", java.util.regex.Pattern::MULTILINE)
matcher = regexp.matcher(str)
puts "Java match with \\A"
while matcher.find()
p matcher
end
end

So, what’s the lesson here? Don’t use caret (^) and dollar ($) if you actually want to match the beginning or the end of the string. Instead, use \A and \Z. That’s what they’re there for.



The first fully functional Ruby compiler


It’s a glorious day. The JRuby compiler is now complete and functional. Without sacrificing interpreted mode. Read more in Charles blog, here.



SQLServer is also stupid


I can’t understand the problems database drivers have with whitespace where there shouldn’t be any. It’s clearly wrong…

So say that you create a table in SQLServer that is nullable. If you use DatabaseMetaData and do getColumns for that table, and check the nullable column, and fetch the value at index 18 – IS_NULLABLE. What you get back here is “YES” which is all according to the API. But if you by chance do the same thing for a column that is non-nullable – what value would you get then?
“NO “. Notice the space. Lovely. I *heart* databases.

… Someone asked me in response to my last blog post which database I would choose if I could. Hard question. I’d rather do without databases. But if I have to, I’d go with Mimer SQL which is probably the most SQL-compliant database ever, and really doesn’t have WTF moments at all. It’s small, it’s from Sweden, and it’s very nice.



Oracle is stupid


I’ve just spent two days debugging and fixing AR-JDBC issues with Oracle. And let me tell you, those days haven’t been fun. I am really unfond of Oracle at the moment. You probably saw my last post. Now, let me add a new point of insanity to the proceedings…

Say that you define a table like this:

create table companies (
firm_id integer default null references companies
);

Now, if you try to get the default value of the column firm_id in some manner (for example, you could use JDBC, DatabaseMetaData.getColumns, and getString(13) on the result). You could also use the OCI8 C interface. You would get the same result. Any guesses? What is the default value of the column firm_id? Some might say that it should be the String “null”. Nope. It’s the String “null “. Notice the space.

Now, if you instead defined your table like this:

create table companies (
firm_id integer default null
);

In this, what is the default value of the column firm_id? it’s “null”. Without a space. Yes, it varies. Yes, it actually varies based on the formatting of the SQL used to create the table. You could potentially use the Whitespace language to embed arbitrary programs in the null default value… Because if there is two blanks between the null word, and the next token, then that’s what you will get from the default value. Notice that we used two totally different interfaces to get this information so it’s obviously something that is saved in the database engine. Wow.

Is this insane? Is it crazy? Am I thoroughly disgusted by now?

Yes. And also, AR-JDBC finally runs all ActiveRecord tests with Oracle.



Lovely intermittent Oracle XE problem


I’m working on AR-JDBC support for Oracle. The easiest way to do this for me is to run Oracle XE inside Windows XP in Parallels. That works fine. Except for this dreaded message that started showing up and killing all my efforts: “ORA-12519, TNS:no appropriate service handler found”. I love Oracle error messages. They are so incredibly helpful!

At least in this case I found a solution to the problem on da Internet. I’m writing this blog post to make the information more obvious to Google.

It seems there is a bug in Oracle XE in the monitoring of processes that manifests this way. Nice. But you can fix it! Just open up your SQL console, write “connect system”, submit your password and then write “ALTER SYSTEM SET PROCESSES=150 SCOPE=SPFILE;”. After that, remember to restart the listener. And everything should be joy again.



Can bytecodes perform well?


I really need to write an answer to a comment that appeared on my post about Rubinius by Kragen Sitaker. It has two points in it, and I really want to address one of them at some length. So here goes. The comment read like this:

“It is byte code based. This means it’s easier to handle performance.”

If by “handle” you mean “not have”.

“garbage collection/object memory can be switched out to use another algorithm”

In your dreams. I suspect you’ll find out that the architecture isn’t as clean as you think it is when you try to plug in some other existing garbage collector or object memory.

It sounds like a good project, but I don’t think its advantages are going to be so enormous as you think they are.

Woha. That’s hard to argue with. Isn’t it? Or maybe not. Let’s being with the second point. Is Rubinius architecture clean and decoupled enough to allow swapping of GC and object memory? My answer is a resounding yes. But of course you can go and look for yourself. Of course, it’s possible to make a system where you can decouple the GC. Sun’s standard JVM does this, for example. But the interesting point isn’t whether Rubinius’ hooks for this is clean enough, but if they are cleaner than MRI’s. If you have ever tried to switch out the GC in MRI, you know that Rubinius beats MRI hands down in this regard. If not, you can ask Laurent Sansonetti at Apple, who have actually done most of the work to switch out the GC in MRI if that was a fun experience.

Let’s see, what’s next? Oh yeah. Bytecodes are always slow. No argument here. C++ will always beat a bytecode based engine. And C will almost always beat C++. And assembler (in the right hands) will usually beat C. But wait… Isn’t Java bytecode based? And doesn’t Java 6 perform on the same level as C in many cases, and in some cases performing better than C? Well, yes, it does… And wasn’t Smalltalk always based on bytecodes? Most Smalltalk engines performed very well. Why is MRI switching to bytecodes for 1.9? And why has Python always been bytecode based? And why is the CLR bytecode based? Why was even Pascal using bytecodes back in the day? (OK, that is cheating… Pascal used bytecodes for portability, not performance, but it still worked well in that aspect too). Erlang is bytecode based.

Basically, there are some cases where a static language will benefit from straight compilation down to hardware machine codes. OCaML is a typical example of such a language. Due to the extremely stringent type requirements of the language, the emitted code is usually faster than C. But that is the exception, and only works for bondage-tightly typed languages. When talking dynamic languages, bytecodes is the real path to good performance. Granted, a naive implementation of a bytecode engine will not perform well. But that is true for a compiler too. The difference is that the part interpreting bytecodes is usually a quite small part of the whole runtime system, and it can be switched out for better performance, piecemal or all together.

There are other reasons. For example, statically typed languages like Java and the CLR family of languages use bytecodes because it gives the runtime system the opportunity to dynamically change the machine code running based on statistics and criteria found out during runtime. This means that your application will actually have better performance in the parts where it counts, and the parts that are not heavily used will not be optimized. (That’s what HotSpot does, for example). This is not possible in a clean compilation to machine code. Java would never have had the performance it has now if it weren’t for the bytecodes.

So please, stop spreading this myth. It is NOT true and it has NEVER been true.



Rubinius is important


I predict that parts of this blog posts will make certain people uncomfortable, annoyed and possibly foamy mouthed. If you feel that you’re of this disposition, please don’t read any further.

As I’m working on JRuby, I obviously think that JRuby is the best solution for many problems I perceive in the MRI implementation currently. I have been quite careful to never say anything along the lines that JRuby is better than anything else, though. I will continue with that stance. However, I won’t restrict myself in the same way regarding Rubinius.

In fact, I’m getting more and more convinced that for the people that don’t need the things Java infrastructure can give you, Rubinius is the most important project around, in Ruby-land. More than that, Rubinius is MRI done right. If nothing substantial changes in the current timeline and plans for Ruby 1.9.1, I predict that Rubinius will be the CRuby implementation of choice within 6 months. Rubinius is an implementation done the way MRI should have been. Of course, Matz have always focused on the language, not the implementation. I’m very happy about that, since it means that we have an outstanding language.

But still. Rubinius will win over MRI and YARV. I’ve had this thought for a while, and I’m finally more or less convinced that it’s true. Of course, there are a few preconditions. The first and most important one is that Rubinius delivers 1.0 as planned, by the end of the year and that it doesn’t have abysmal performance. Or if YARV would happen to be totally finished and perfectly usable in the same time frame, things might take a different turn.

Why is Rubinius so good, compared to the existing C implementations? There are a number of good reasons for this:

  • It is byte code based. This means it’s easier to handle performance.
  • It has a pluggable, very clean architecture, meaning that for example garbage collection/object memory can be switched out to use another algorithm.
  • It is designed to be thread safe (though this is not really true yet), and Multi-VM capable.
  • It works with existing MRI extensions.
  • Most of the code is written in Ruby.
  • It gives you access to all the innards, directly from your Ruby code (stuff like MethodContexts/BlockContexts, etc).
  • The project uses Valgrind to ensure that the C code written is bullet proof.

Anyway. I put my money on Rubinius. Of course, that doesn’t mean I don’t think JRuby have a place to fill in the eco system. In fact, the real interesting question is what will happen when both Rubinius and JRuby have become more mature. I’d personally love to see more cooperation and sharing between the projects. Not a merging, since the goals are too separate, but it would be wonderful if JRuby could use the same Ruby code for all the primitive operations as Rubinius does.

Right now we have a simple Rubinius engine in JRuby, that can interpret and run some simpler byte codes from Rubinius.

JRuby and Rubinius are both extremely important. Right now I believe JRuby is more important, since it opens up a totally different market for Ruby, and gives the benefits of Java to Ruby. Rubinius has another place to fill.

Of course, being who I am, I have also looked into what would be required to port Rubinius to Java, using the same approach directly instead of going through JRuby. If you decide to use Java’s Garbage Collector, Java Threads, and reuse the JRuby parser you would end up with about 40 files of C code to port. Most of these are extremely easy, and none is really that hard. And what you would end up with is something that would run the same things Rubinius does, but with the possibility of invoking Java code at the same time. (Of course, I hope that Evan reserves a block of about 8-16 bytecodes that can be implementation dependent – these Jubinius would use to interop with Java code).



Ruby+Erlang concurrency?


I keep reading from lots of people that you can’t bolt Erlang’s concurrency model on Ruby. But is this really true? MRI already has green threads. Adding a higher level of concurrency with the basic primitives of !, recv and spawn doesn’t seem like a gigantic project. The main problem would be to prohibit access to shared memory between the spawned green threads, and avoid the GIL. But that doesn’t seem to be that large of a problem. The main question is rather if this model would fit well with the Ruby language… Since Erlang was designed from the ground up with these primitives in mind, many of the libraries and functions work well with it. For example, pattern matching work exactly the same in recv as in function dispatch or case expressions.

On the other hand, the send, recv and spawn primitives in Gambit Scheme seems to work out really well even though the language is LISP in root (or maybe that’s the reason?)

In fact, this is one of the few places where it would be harder to add something to JRuby than MRI. Since we can’t control the full stack it would be very hard to implement anything resembling Erlang processes in Java. And Java threads would almost certainly be too heavy weight for this to work. Hmm.



Practical JRuby on Rails released!


Today my book Practical JRuby on Rails has been released by APress. It has got two forewords, one by Martin Fowler and one by Pat Eyler. And you can order it right now from Amazon here. Hopefully it will also be available in your closest computer book store!



What about Sun’s Ruby strategy?


Wow. Today was a strange day for blog reading. I’ve already had several WTF moments. Or what do you say about 7 reasons I switched back to PHP after two years on Rails, where the first and most important reason seemed to be

IS THERE ANYTHING RAILS/RUBY CAN DO THAT PHP CAN’T DO? … (thinking)… NO.

Bloody hell. All programming languages in use today are Turing complete. Of course you can do exactly the same things in all of them. But I still don’t program in Intercal that often.

Example number two: About JRuby and CPython performance. This blog post uses the Alioth language benchmark to show us that Python’s C implementation is faster than JRuby. Of course, the JRuby version used in this comparison is 1.0, which is now almost 4 months old. Comparing the C implementation of one language with a Java implementation of another language seems kind of suspect anyway.

But these two examples is nothing compared to a post by Krishna Kotecha from yesterday. It’s called Sun’s Ruby strategy – Engage and Contain?. You should ge read it. It’s actually quite amusing. I usually don’t respond to other blog posts, and I don’t usually quote in my blog. I’ll do an except here, because there are several points in that post I want to elaborate on.

Compromise definitely seemed to be an underlying theme at RailsConf Europe. DHH’s keynote downplayed the need for evangelism – something I strongly disagree with. Rails has certainly made a lot of progress towards wider acceptance, but we’ve got a really long way to go before more companies start to adopt it, and I certainly don’t think turning down the evangelism and doing stealth deployments via JRuby is the answer.

There are really two interesting points in this paragraph. First of all the question of evangelism. I would say it’s about time to turn it down. Actually, I’ve gotten the very real impression that the wild-eyed Rails evangelism is now turning people away from Rails rather than winning more “converts”. Telling people about the advantages of Rails is still something that needs to be done, but the full fledged Rails marketing machine has already done it’s work and should be turned down a notch.

The second point Krishna sneaked in there is about JRuby “stealth deployments”. I’m pretty sure no one will ever do a real stealth deployment, and I find that concept totally wrong.

At RailsConf Europe 2007 however, Dave didn’t even specifically discuss Rails – and this seems to have been at the behest of the conference organizers. If this is the case, then the Rails community is already in trouble. Is this the price of Sun’s ’support’: that the community is no longer able to freely discuss the platform and what work needs to be done to get it accepted in the enterprise on its own terms?

Where does this particular conspiracy theory come from? Is there any evidence whatsoever that the organizers of RailsConf wanted Dave to not speak about the shortcomings of Rails? And even if that were the case, what’s there to say that Sun is the reason for this? (Couldn’t it have been IBM or ThoughtWorks, who were also Diamond sponsors?)

I have real problems with this attitude and approach. Selling Rails and Ruby, as “just a Java library” is a massive disservice to the technology, and simply means enterprise customers and decision makers won’t evaluate Ruby on its own merits.

What is important to realize is that the argument “just a Java library” will only ever be used in the case of organizations where there are good technology arguments for using JRuby on Rails, but non-technical management are making decisions based on what is most safe at the moment (see the Blub Paradox by Paul Graham). In most cases the “just a Java library” argument is useful only when talking about conservative environments who are standardized on a certain platform. And believe me, Krishna, there are many places where Java is the only allowed technology to be deployed. But in most cases JRuby will work fine for those IS departments. Is it really a disservice to the technology to make Rails and Ruby into something that can be used in even wider domains, removing cruft and bloatware at all places possible? Is it a disservice for the technology to be used in places where it would never enter without the help of JRuby?

But JRuby is not the best answer for Rails and Ruby developers.

I don’t really understand this quote. Obviously Krishna have strong opinions about the subject, but stating something as a fact without telling the reasons for it being that way doesn’t feel that interesting.

Serious Rails deployments (Mongrel not some Java Application Server) within enterprise environments may be difficult to achieve, but with the right political backing and developer persistence it can be done.

I must say I find it interesting that Mongrel is viewed as a serious deployment option when compared with a standard Java application server. The question here isn’t how we should get enterprise environments to use Mongrel; rather, we should first decide if Mongrel really is a serious enough deployment environment for enterprises.

And this will benefit the whole Rails community – not just those who tie themselves to Sun’s technology platform.

I heard this rumor about Java being Open Source… Does that still mean tying yourself to “Sun’s technology platform”?

And also, the vendor with the most to lose if Rails really does fulfill its potential in enterprise environments.

Why? What exactly does Sun lose if Rails win? Sun is using Rails and benefiting from it. And be careful about this: Rails is Rails, no matter if it runs on JRuby, MRI, YARV, Rubinius, IronRuby, Ruby.NET, XRuby, Cardinal or any other Ruby implementation now or ever. Rails is Rails.

Maybe I’ll start to believe when they start promoting Ruby on Rails at JavaOne, as opposed to promoting JRuby on Rails at RailsConf.

I guess you didn’t attend JavaOne 2007, where both JRuby on Rails and Ruby on Rails had sessions, including promoting in one of the major keynotes. Sun is serious about Java being a multilingual platform. Of course they’re spending money on getting these languages working on Java, but Sun is also giving support to both Rubinius and MRI. Would they really do that if this conspiracy theory is correct? For more information about that particular data point, take a look at Tim Bray’s blog here: Rubinius Sprint.

Much more likely I think, is that we’ll see a Java based Rails alternative that ships with some new version of Java which has been designed to incorporate features from dynamic languages like Ruby and Python.

Java will almost certainly never ship with a web framework. That said, Phobos is one of the Sun projects for web development that uses JavaScript and incorporates features from Rails and Ruby, and also Python and other languages and frameworks.

And still, Sun doesn’t seem to have a problem with Rails and Phobos living side by side. GlassFish includes support for both. And the Rails support doesn’t make any changes to Rails, it doesn’t require you to do anything extra, except that your application should run in JRuby. The latest version basically allows you to say “glassfish_rails start” while standing in your application directory.

…what compromises are we making for Sun’s involvement…

Yeah. What compromises are we making for Sun’s involvement in the community? Except handling the fact that we get more commercial backing, more money in the ecosystem, more help from Sun engineers creating high quality Ruby code, a server that happens to host the SVN server for Ruby itself, and so on? Are these contributions? Sure. Are they commitments? Yeah. Are they something that will require compromises from the community? No, not really.

Sometimes I think that many in the Open Source world still panics as soon as a big company starts to make inroads. And yes, in many cases this hasn’t worked out well. But we gotta see to the facts too. Some companies we will never be able to trust, but Sun has definitely been on the right side of the Open Source fence for a long time. Come on, people.