A Personal blog

Due to lots and lots and lots of stuff happening in my life right now, I haven’t had the time to write anything substantial here in a while. Hopefully, this will change.

I have also decided to start a new blog, for more… personal musings. Books, music, movies. Stuff like that. Nothing that would interest the regular readers of this blog, probably. But anyway, here’s the address: http://olabini.blogspot.com.

Ah, right: Have a merry christmas and a happy new year, everyone!

The difference between Kernel#` and Kernel#system

Today I had a fun learning experience. It cost me several hours of work, so I will post a small notice about it here so Google can make other developers lives easier. Or maybe I’m the only one who did this mistake.

Anyway. What I was trying to do was to start an external Ruby script (from another Ruby script). This other Ruby script went daemon, but since I didn’t want to install the daemonize package (another bad decision, probably), I just wrote the script in question to fork and detach. Now, I have condensed the question a little, to this Ruby script:

 `ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'`

Everyone please raise their hands if it is obvious that this script will sleep for 5 seconds before giving back my prompt. It wasn’t obvious for me, since it was a long time I did UNIX System programming.

For those who still want to know, the problem is that backtick binds to the started process’ STDIN, STDOUT and STDERR. As long as STDOUT is live, backtick will wait. And since the forking and detaching doesn’t redirect all the STD* streams, this will wait until both processes has finished.

There are two ways to fix this. One right way, and one fast way. The right way is to detach the rebind the streams after forking. This can easily be done with this code:


The faster way is to replace backtick with a system call. Since system isn’t interested in the output from the process, it will not bind those streams. So just running this instead, will work:

 system "ruby -e'if pid=fork; Process.detach(pid); else; sleep(5); end'"

I have learned the lesson. I have bought a copy of the Stevens book. (UNIX Network Programming, which detail the interaction between fork and ports, which was what my original problem was about.)

Dynamic Ruby power and static balance

Update: This post has been updated to explain, clarify and remove certain things that sounded like an attack on people that didn’t agree with me, especially Austin. This was certainly not my intent when writing it. Added explanations will be highlighted with italic text.

Sir Bedevere: And what do you burn, apart from witches?
Peasant 1: More witches.
Peasant 2: Wood.
Sir Bedevere: Good. Now, why do witches burn?
Peasant 3: …because they’re made of… wood?
Sir Bedevere: Good. So how do you tell whether she is made of wood?
Peasant 1: Build a bridge out of her.
Sir Bedevere: But can you not also build bridges out of stone?
Peasant 1: Oh yeah.
Sir Bedevere: Does wood sink in water?
Peasant 1: No, no, it floats!… It floats! Throw her into the pond!
Sir Bedevere: No, no. What else floats in water?
Peasant 1: Bread.
Peasant 2: Apples.
Peasant 3: Very small rocks.
Peasant 1: Cider.
Peasant 2: Gravy.
Peasant 3: Cherries.
Peasant 1: Mud.
Peasant 2: Churches.
Peasant 3: Lead! Lead!
King Arthur: A Duck.
Sir Bedevere: …Exactly. So, logically…
Peasant 1: If she weighed the same as a duck… she’s made of wood.
Sir Bedevere: And therefore…
Peasant 2: …A witch!
(quotes from Monty Python and the Holy Grail, courtesy of IMDB)

My post announcing Ducktator seems to have stirred up a few emotions on Ruby-talk. Of course, most of this is my fault, by naming the library in such a frivolous way and not explaining the domains for its usage correctly. But on the other hand, there seems to be a general confusion about the concept of Duck typing, dynamic versus static typing, validation and other issues. Actually, I get a whiff of religion when my mention of Duck typing engendered such a diverse set of responses.

Of course, my reaction about duck typing was as religious. I see this is a general trap when discussing programming languages. The Ruby community is altogether very good at avoiding religion, which caused me to be quite startled when I found hints of it. Duck typing as a concept seem to be very loaded right now. I’m merely pointing this out as something that we should take care to be on the watchout for. Just as I will do from now on, I suggest people in the Ruby community should try to be as objective as possible, when discussing this.

And everyone and their aunt seem to have different opinions on what duck typing really is. It’s all quite fun, actually, except for the fact that it misses the point. I should have avoid mentioning ducks. I should have avoiding saying anyting at all about typing, since that isn’t the point. And I bloody well shouldn’t have used the class-validator in my example. Well, done is done. And this post won’t be about that. Just the next paragraph.

The Ducktator disclaimer

I won’t mention the words duck typing from here on. I would change the name of the project if it wasn’t so damn hard in RubyForge. But what I want to explain is this. Ducktator is about validating things. But not everywhere. You shouldn’t use Ducktator at those places where you have one or two checks for something in an object. You should really only use it at the borders of your code. The borders where you you will receive complex objects. Really complex objects where a method_missing won’t tell anyone anything useful at all. The use case I had in mind when writing the library was for RubyGems, when the YAML spec for a Gem has been loaded, to check that the important parts actually have what it takes to get into the source index. Since I managed to break RubyGems this way, I feel that this kind of validation can be really important. Once again, this is validation of live Ruby objects. Nothing else. You can check practically anything you want, but the easiest examples have been about each, class and respond_to. Hope this clarifies things a bit.

I removed the entire paragraph about typing. But my recommendation still stands; if you find formal types in programming languages interesting and/or confusing, read Programming Language Pragmatics, and you will be enlightened.

The main point

The reaction to my possibly improper use of the term Duck typing engendered a very strange response, which I hadn’t expected. Of course, I realize that this is a very obvious community effect. Since Duck typing is one of the trademarks of the Ruby community, it also means everyone has opinions on it, and more importantly feel the need to defend it as soon as some threat is perceived. Steve Yegge has written lots and lots about what language religion is really about, and I feel that this is an extension of that issue, so I won’t write more about it here either. You can find more in many of this excellent Drunken Blog Rants.

Finally. Balance is what I’m after. One person (Austin) said that the d**k t****g philosophy (I had written the word ‘issue’ here. That seems to have been misinterpreted. I blame that on my poor grasp of English, since my mother tounge is Swedish. =) is about TRUST. That you should trust the caller of your library to read your documentation (which – obviously – is perfect), and supply the correct objects. This isn’t too much to task if your docs are up to notch. And if the caller is the same one that will suffer if he mishandles your library. But trust isn’t enough when you’re at the borders. When talking to other languages through shaky serialization systems. When talking with clients that possibly could be hostile. (Yes, in this case setting $SAFE helps, but it doesn’t go all the way). (Sandbox is – or will be – a good alternative here, but I still see places where object validation is a better solution.)

Further, Austin responded in his blog post that he thinks I have ‘set up a false dichotomy here: people who are for duck typing as trusting your caller are against validation’. This wasn’t my intention. Actually more the other way around. I am for duck typing, in most places. What I’m saying is that no solution is perfect at all points in your code and duck typing is good fit in many, but not all. Further, the next paragraph clarifies my wish for balance.

What I’m saying is, most of the time you won’t need it, but in some cases, some kind of interface validation really helps a lot. I know the so called dynamic community doesn’t like to hear this. But what is so dynamic about failing without control? (The arguments I heard about letting code fail when the method isn’t there sounded very much to me like failing without control. That was my interpretation of the argument that you don’t need to use respond_to? for duck typing.) I know that I, as a developer isn’t infallible. I make mistakes. Most of the times I am in control of all my objects, but there are times when I’m not. For example, there are situations where I develop smaller applications for other (non-programmers) people. I like to create configurations and rules in YAML for these projects and leave the client in charge of configuring the application. But, what if he/she/it makes a mistake? Using the ‘other’ way, I would fail when trying to call protocol on something that should have been an URI, but wasn’t because the person made a typo and put an illegal character inside the URL. Will that message help the person doing the configuration? Should you wrap your calls in rescue’s all over the place and give the same explanation? Should you trust that the (non-programming) client should be able to read your RDoc and figure out that a method (which I bet you didn’t name get_uri_from_yaml_configuration) failed because of something they did in the configuration? I believe not.

What I’m really ranting about is balance. There needs to be a balance between checking and laissezfaire. In most places, just calling the method is fine. In other places it’s appropriate to check with respond_to?, in some cases you need to check the class. We’re programmers. We are supposed be good at judging which technique to use where. Yes, Ruby is dynamic language. Yes, Ruby is very easy to learn. Yes, Ruby makes most stuff very easy on you. That doesn’t mean you should stop thinking. It doesn’t mean you should be lazy. We are programmers, and we should be able to adapt.

One more time. Balance. Balance. Everywhere. And I do love the Ruby community. It is the best. Even though people get mad at each other, we can solve our differences. I’m proud of being a part of it.

MySQL, some concrete suggestions!

After my post Rails, Databases, ActiveRecord and the path towards damnation, I got an e-mail from rten Mickos, the CEO of MySQL. He asked me to provide concrete suggestions on how to improve MySQL (since the other post just contained some unspecified not-like vibes), so that’s the rationale for this post. I’m going to point at a few things I see as a problem for using MySQL as a production database right now. Standard disclaimer stands: these are my opinions, my own only, and my employer doesn’t necessarily agree or disagree with them on any level.

Let us jump into the fray:

  • Sequences. I would like real, nice and sweet sequences. I really don’t like to have no control of my primary key generation, and I especially don’t like that I can’t have sequences for anything else. The recommended solution according to the manual is to create a table with one auto-increment column in it, and use this as a sequence. That’s not acceptable, especially since I cannot tie this so-called sequence to the generation of id’s on other tables with subselects and other fun things.
  • OK, I really don’t like the auto-increment feature. Why not provide an IDENTITY keyword like the non-core feature ID T174+T175 specifies?
  • Real, honest-to-god, boolean types. Real ones. Not tinyint(1)s. Not enums. Not tinyint’s hidden behind the word boolean (like JDBC). Real boolean types.
  • I would like table1 and Table1 to be different (as per the spec). Oh yes, we seem to live in an insensitive world (case and otherwise) with Windows all over the place. But in my database I want that kind of control.
  • Limiting the return values of result sets. Now, I have no problem with LIMIT and friends, but since there is a spec, and that spec has a feature for this functionality too (T611), why can’t that be in MySQL?
  • Time-types should be able to store fractional seconds and time zones.
  • And what’s the matter with the TIMESTAMP type? That doesn’t really do what the standard says it should do. Please give it a name not in the standard.
  • And for Pete’s sake, double bars is for concatenation in SQL. || is for ‘or’ in programming, but SQL is a DSL. This screams leaky abstractions and is very annoying.
  • Stability of 5.0 features. I know triggers, foreign keys and stored procedures are all there now. But frankly, I don’t trust my referential integrity with them yet. Not from a database vendor that a few years ago wrote in their manual that the only reason for foreign keys was to be able to let GUI’s diagram relationships between database objects. Not from a vendor that said that you don’t need transactions to ensure data integrity. All in all, I want these features to be around a few hours, get the bugs hashed out, let them be pounded on for a while. But that’s not going to happen if people move to Rails, since Rails doesn’t believe in data integrity or foreign keys.

Well, that’s that. Only my opinions, remember? Anyway, for small and fast development, MySQL is really useful. I’m just arguing that a big production system should choose something else.

Announcing Ducktator – A Duck Type Validator

As I hinted in my last post, I feel with all my heart that there should be some way to actively validate my static expectations on certain kinds of objects. Now, respond_to? and friends are fine, but they do not scale. Not at all. So, I have built Ducktator – a duck type validator. It uses a very recursive, extensible rule syntax. Rules can be specified in either YAML or simple Ruby, with hashes and arrays and all that stuff.

First though, where would that be useful? Not everywhere of course, but these are the places that just drops into my head when writing this: Validating objects that have been serialized or marshallad. Validating what you get when loading YAML files, so that the object graph matches what your code does. Write test cases that expect a complicated object back. The possibilities are many.

Ducktator is very easy to extend. Basically, you just create a method on the validator whose name begins with “check_” and this will be automatically called for all objects. The base library is divided into modules that are mixed-in to the central Validator. I won’t detail exact usage here, but just show an example. First, the rule file, which resides in rules.yml:

each_key: {class: String}
- - 0
- class: Symbol
- - 1
- class: Integer
- max: 256

Then, our code to create a Validator from this:

require 'ducktator'
v = Ducktator::from_file('rules.yml')

And lastly, to use it to validate the objects foo and bar:

foo = {'baz' => 13}
bar = {'b1' => [:try1, 130],
'q16' => [:foobaz, 255]}
v.valid?(foo) # => false
v.valid?(foo,bar) # => false
v.valid?(bar) # => true

Now, you’ll certainly be wondering where to get this interesting code. As always, it will be found on RubyForge here, and the first release is available through gems, so just gem install ducktator and you should be set to go.

It is licensed with a nice MIT license and I am the project creator, maintainer et al.

YAML needs schema

It has been said before and it needs to be said again. YAML really needs schema. Now, before all your enterprisey warning bells start ringing I want to add that I’m only proposing this for specific applications. Most uses of YAML can continue gladly without any need for schema. But for some cases the security and validation capabilities of a good YAML schema would be invaluable. One example could be for RubyGems. It shouldn’t be possible to crash RubyGems with bad YAML. Also, in all cases where Ruby emits objects as YAML it should be possible to automatically generate a schema specification from the object structure. This means that in many cases you may not need to create your schema by hand. You could just serialize your domain objects to YAML, take the schema generated and modify it as needed.

What would the advantages of YAML schema be? Numerous:

  • Validation: Validate that a YAML file conforms to your expectations before loading it
  • Default values: The possibility to provide default values for missing parts of the YAML, making convention over configuration even more powerful. With reasonable defaults most YAML documents could shrink dramatically in size.
  • Tool help: GUI builders and other tools would be able to help you construct your YAML-file from scratch. I like being able to auto-complete XML with nXML in Emacs. Very neat. I just wish I had that capability with yaml-mode too.
  • Loading hints and instructions: A schema could specify that the key named ‘foo’ always has a value with the tag !ruby/object:Gem::Specification or that all integer values should be decimal, regardless of leading zeroes. Many instructions that you at this point need to customize your YAML system to achieve would be automatic.
  • Remove clutter from YAML-file: If the schema defines the tags for values, it means that this information doesn’t need to appear in the YAML file itself, reducing clutter and noise. This would make it even easier to edit YAML files by hand.

A YAML schema format should be specified in YAML, and it should be self hosting (meaning it’s format language should be definable in itself). For most parts it seems we can use ideas from XML Schema. The only part I’m not really sure about for YAML schema is how to bind a document to a schema. Maybe the best way would just be to add a new directive that specifies the schema for that document. I don’t believe that YAML needs different schema for different parts of documents right now, though. I don’t think we need the proliferation of schema metadata inside the YAML document that XML experiences. (Anyone tried to manually work with a WSDL-file which includes all requires namespaces and such? Nightmare!)

There are a few different parts needed for this to work. I believe it could be done with the current YAML spec (and retrofitted on YAML 1.0 too), since the only real change to the document would be a new directive in the stream header. The next step is that someone starts defining a format for schema. Then, a tool would be needed that could validate against schema. This wouldn’t reap us all benefits of schema, but it’s a start. The final step would be to integrate schema support in existing YAML libraries, to allow validation and using schema for metadata information.

Actually, this solves exactly half the problem, the part of the problem I call the external validation. The other part is not YAML specific, and it’s something I’ve been thinking about for Ruby. This regards validation of object hierarchies in the current language. Expect some more info on this in one or few days. I want to have something usable to release. But I believe the Ducktator will be really useful for certain use cases.

MetaProgramming Refactoring

Reflexive metaprogramming have been part of programmer consciousness for a long time. It’s been possible in many languages in one way or another. Some have embraced it more than other, and among these the most prominent are Lisp, SmallTalk, Python and Ruby. But it’s not until Ruby entered the common programmers mind that Metaprogramming actually starts to become common place. The discussion on DSL’s is also relevant for metaprogramming issues, since implementing a DSL (in the same language, of course) is very hard without reflexive metaprogramming.

I recently reread my copy of Refactoring, and as usual I was amazed by how on-topic it was, and how easy and useful the tips in it where. But, I also started thinking that there is something missing. Refactoring is specifically about Object Oriented Programming, but I’m heading more and more towards Language Oriented Programming, with DSL’s, reflexive metaprogramming, introspection and Meta-class extensions, These approaches make the base OOP system much more powerful. This is also very prominent when prototyping smaller
systems. I find that I start by writing methods in such a way that I have to do very much by hand, and in the next stage I fold my code, as much as possible, both for readability and laziness.

What is this blog about, then? Well, I propose that the time is right and nigh for a catalog of Metaprogramming Refactorings. I’m not saying I should do them, not at all, but it would be something very nice to have. Maybe as a wiki somewhere, where some of our metaprogramming luminaries could write about their experiences. DHH? _why? Weirich? Dave Thomas?

Anyway, just to make it very apparent what kind of refactorings I’m talking about, I will provide a somewhat contrived example. This is more or less what a mock implementation of something could look like. Some log calls, and quite much repetition.

def startup
@log.info { "-startup()" }

def init
@log.info { "-init()" }

def main
@log.info { "-main()" }
puts "hello from main"

def close
@log.info { "-close()" }

def shutdown
@log.info { "-shutdown()" }

I present the Extract Code Template metarefactoring. The first step is to take all the method names that should be handled and put these in a list, like this:

[:startup, :init, :main, :close, :shutdown]

Then we walk through these definitions, and provide empty bodies for each, like this:

[:startup, :init, :main, :close, :shutdown].each do |name|
define_method(name) do

We then have to change the list to a hash, making each method-name point to the methods to call in the method, like this:

{ :startup => [:startup_foo, :startup_bar],
:init => [:init_vars, :init_constants, :init_other],
:main => [:run_main],
:close => [:close_bar, :close_foo],
:shutdown => [:all_shutdown, :run_shutdown] }

When this has been done, we have to add the part of the main method which can’t be extracted in the same way, which we do with a proc:

{ :startup => [:startup_foo, :startup_bar],
:init => [:init_vars, :init_constants, :init_other],
:main => [:run_main, lambda { puts "hello from main" }],
:close => [:close_bar, :close_foo],
:shutdown => [:all_shutdown, :run_shutdown] }

The next step is to walk through the method names and values, and define the method contents. We then remove the original methods. Finally, our code may look like this:

{ :startup => [:startup_foo, :startup_bar],
:init => [:init_vars, :init_constants, :init_other],
:main => [:run_main, lambda { puts "hello from main" }],
:close => [:close_bar, :close_foo],
:shutdown => [:all_shutdown, :run_shutdown] }.each do |name, methods|
define_method(name) do
@log.info { "-#{name}()" }
methods.each do |m|
if m.is_a? Proc
self.send m

Now, in this case I’m not sure I would do this refactoring at all. This serves more as an example of the kinds of refactorings I would like to see in a catalog like this. Refactorings like Extract DSL, Create Class Dynamically, Extend From Anonymous Class and others. This is something I really feel would be useful in today’s programming environment.

Comments and tips are very welcome.

JvYAML and RbYAML – what’s to come?

I know I’ve posted about the next release of JvYAML a few times, but going has gotten tough, since there are so many interesting projects to work at. Anyway, I was thinking that I would blog a bit again, since a significant new feature has reared it’s head. It’s about YAML 1.0 compatibility. This needs to be added to both JvYAML and RbYAML in fact. The reason is a recent incident where YAML communication between JRuby and Ruby broke in a slightly embarrassing way. Most of you probably know which incident I’m talking about.

The problem is very simple. YAML 1.1 is _almost_ backwards compatible to 1.0, with the exception for a few points. The point that broke is the shorthand tags from the YAML type repository. In YAML 1.0 you could prefix a value with !str and this means it shouldn’t be interpolated as another type of value. A typical example (and actually the example that triggered the incident) is this:

version: !str 0.2

Now, in YAML 1.1, it doesn’t look the same way. It was decided that a single exclamation point is actually shorthand for the user namespace, while a double exclamation point means the yaml.org:2002-namespace. So, the above example in YAML 1.1 is

version: !!str 0.2

This is a tiny change, but it breaks, since YAML 1.0 handles !!str as a private type, and YAML 1.1 handles !str as a private type. Not a really nice situation.

The solution in my case will be to add a flag for RbYAML and JvYAML that specifies that you want 1.0-compatibility. When that flag is turned on, some of these issues will be handled correctly by the parser, and emitted in a way an 1.0 parser could read. This will be the only change in RbYAML. But JvYAML will contain (as detailed before) an emitter, JavaBean materialization and many bug fixes.

Rails, Databases, ActiveRecord and the path towards damnation

The last two days I’ve spent some thought on databases and Rails. I haven’t gotten far, but I do know that Rails have serious trouble with regard to databases. Some of this thinking comes from David Blacks talk, other parts from DHH’s rant after the panel discussion, and some from discussions with other Rubyists and Railites.

So. What are these problems? The first one is MySQL. Now, I don’t want to bash MySQL. Not really. But it is not a good database. Until recently it’s been very bad on SQL compliance. It’s slow. It’s cumbersome. The foreign keys are annoyingly incomplete. And some MySQL-extensions have a tendency to be preached as gospel by people who doesn’t understand databases. (But I guess this isn’t really the fault of MySQL). Actually, the worst part with MySQL is that ActiveRecord have been designed based on it. Now, I really do understand 37signals point of view on this. Of course, if MySQL is good for them, I understand that they have built that support in deep. But this puts the rest of the world using Rails in a tight spot. It is incredibly hard to get other databases working with Rails, and even if you do get them working, it will be slow. Really slow. Take Oracle. Oracle lives and dies by prepared statements. But there is no sane way to do this in Rails. Instead, SQL is generated dynamically and Ruby code is used to quote variables instead of doing this as part of the prepared statement. This is obviously very much painful when doing the JDBC adapter, but it is really important for all serious databases. Having prepared statements would also cut down on much of the database specific code in Active Record, since quoting would be up to the database driver, as it should be.

OK, problem number two. Real world database design. As Black has noticed, this isn’t talked about in the Rails community. At all. Of course, as one person in the audience noted, this is partly since using ActiveRecord and designing your objects with care results in 3rd normal form without effort, but this is not the whole story. There are much more important issues in database engineering than normalization, and lets face it, most Rails developers produce pretty crappy databases. This needs to be investigated, talked about, discussed. It needs to come out into the open. This discussion is needed for many reasons. I want to be able to use Rails for all applications where it makes sense. But most of those places won’t be possible until the database support doesn’t kill Oracle when trying to use it. Or requires a database with no good management tools.

The limits of power: What Lisp can do but Ruby can’t

I’ve for a long time been thinking about where Ruby’s limits are, compared to Lisp. As I see Lisp as the ultimate power, this is mostly about trying to gauge the power of Ruby, in a very unscientific, highly opinionated and very subjective way. I talked some with Jim Weirich, _why, Charles and few others about this in London which started me thinking again. Actually, I’ve only find one macrotype that really isn’t possible in Ruby. And it’s only impossible if you require that the syntax doesn’t change.

First, required reading for this post is Why Ruby is an acceptable Lisp by Erich Kidd. I happen to agree with this, in almost all cases, but there are a few corner cases where Lisp is just more convenient. As a first example, let’s take a typical AOP task. I want to define a method to execute before the method foo. The definition of foo is like this:

def foo(arg1, arg2, arg3)
do.something {}

And my before-advice, which I would really like to be able to write like this:

defbefore foo(arg1, *args)
puts “before foo with first arg #{arg1}”

This isn’t possible. The closest I really can come up with is this:

defbefore :foo do |arg1, *args|
puts “before foo with first arg #{arg1}”

This isn’t so bad, of course. But it puts a disconnect between the language and your extensions. The most powerful macro facility is invisible to the programmer. There should be no division between how keywords work and how macros could function. To take another example of this, take the classical pattern for logging:

logger.debug{“baz: #{expensive_formatting_operation(baz)}”}

where the block is used to avoid calling expensive_formatting_operation if debug-logging isn’t turned on. This is neat. But it isn’t neat enough. I would like to be able to write

logger.debug “baz: #{expensive_formatting_operation(baz)}”

and avoid having expensive_formatting_operation run if debug-logging is off. In the general case this isn’t possible in Ruby. There is parts of the execution process that there are no hooks into.

Of course, as I said in the beginning, this doesn’t really matter in most cases. In almost all cases the convenience of Ruby’s powerful syntax, emerging libraries and great frameworks is what you get for that small power-tradeoff. But even so, I would like to be able to go that extra distance in power. Could something like this ever be possible in a language that has syntax? I’m not sure. Maybe if there was a well defined way that Ruby translates into something that resembles S-expressions. In that case you could have macros that work on these internal concepts instead of on pure Ruby-syntax. Of course, this means a division into two languages, but it would give that extra power.