The magic it variable in if, or solving regular expressions in Ioke


I’ve spent some time trying to figure out how to handle regular expression matching in Ioke. I really like how Ruby allows you to use literal regexps and an infix operator for matching. That’s really nice and I think it reads well. The problem with it is that as soon as you want to get access to the actual match result, not just a yes or no, you have two choices – either you use the ‘match’ method, instead of ‘=~’. The other solution is to use the semi-globals, like $1, or $&, etc. I’ve never liked the globals, so I try to avoid them – and I happen to think it’s good style to avoid them.

The problem is that then you can’t do the matching as well, and the code doesn’t read as well. I’ve tried to figure out how to solve this problem in Ioke, and I think I know what to do.

The solution is to introduce a magic variable – but it’s distinctly different from the Ruby globals. For one, it’s not a global variable. It’s only available inside the lexical context if an ‘if’ or ‘unless’ method. It’s also a lexical variable, meaning it can be captured by a closure. And finally, it’s a general solution to more things than the regular expression problem. The Lisp community has known about this for a long time. In Lisp the macro is generally called aif. But I decided to just integrate it with the if and unless methods.

What does it look like? Well, for matching something and extracting two values from it, you can do this:

str = "foo bar"
if(#/(.*?) (.*?)/ =~ str,
  "first  element: #{it[1]}" println
  "second element: #{it[2]}" println)

The interpolation syntax is the same as in Ruby.

The solution is simple. An if-method, or unless-method will always create a new lexical scope including a value for the variable ‘it’, that is the result of the condition. That means that you can do a really complex operation in the condition part of an if, and then use the result inside of that. In the case of regular expressions, the =~ invocation will return a MatchData-like object if the match succeeds. If it fails, it will return nil. The MatchData object is something that can be indexed with the [] method to get the groups.

The end result is that the it variable will be available where you want it, but not otherwise. Of course, this will incur a cost on every if/unless invocation. But following my hard line of doing things without regard for optimization, and only with regard for expressability, this seems like the right way to do it.

It’s still not totally good, because it’s magic. But it’s magic which solves a specific problem and makes some things much more natural to express. I’m not a 100% comfortable with it, but I’m pretty close. Your thoughts?


24 Comments, Comment or Ping

  1. Hi Ola,

    it’s an interesting approach. I particularly like how the variable is lexically scoped thus not making it a global variable.

    However, are we to assume that the token ‘it’ has keyword-like properties? Your emacs major mode file suggests so:

    (defconst ioke-standout-names ‘(
    “it”
    )

    If this is the case, I was wondering what the difference between a keyword and a standout-name is. i.e. what is the motivation for not making ‘it’ a keyword?

    November 17th, 2008

  2. Marcelo Gomes

    I prefer something like:

    #/(.*?) (.*?)/.ifMatches(str, matchData,

    “first element: #{matchData[1]}” println
    “second element: #{matchData[2]}” println,

    “no matches found for #{str}” println)

    than the magic variable, that increases ioke’s complexity adding one more concept to it.

    November 17th, 2008

  3. Sam:

    It’s a standout name in the Emacs file, because you should generally notice when it’s used. But there are no real keywords in Ioke. Everything follows the same rules. The names I gave it in the Ioke-mode is more about showing how they are used. So the keywords in the Ioke-mode is not regular language keywords. Instead they are builtin methods which do things that you won’t see from most methods in Ioke (although you could do them). So they have a keyword like role, but are not keywords.

    November 17th, 2008

  4. Marcelo:

    Yeah, I get that, and I thought about doing that too. But it ended up being very verbose and didn’t feel natural to program in. And it still need to have different semantics inside the ifMatches method, so there are new concepts anyway.

    What I like about the ‘it’ approach is that it solves a pattern I’ve seen in other cases too. It’s a more general solution.

    November 17th, 2008

  5. Had a semi-long thread about introducing an “it” keyword into Ruby for something similar. Might be worth reading:

    http://markmail.org/message/qhu54y3sxhpgdwog?q=introducing+it+keyword+ruby+fodor&page=1&refer=ub45ni5t4t2p4tjf

    November 17th, 2008

  6. stefano

    If the =~ operator in Ioke returns a MatchData object (in Ruby it returns a Fixnum), you could always write:

    str = "foo bar"
    if(it = #/(.*?) (.*?)/ =~ str,
    "first element: #{it[1]}" println
    "second element: #{it[2]}" println)

    Making the assignment implicit might be a cleaner (but more complex) solution, though.

    November 17th, 2008

  7. vsevolod

    it’s the widespread in Lisp anaphoric construct, isn’t it? ;)

    November 17th, 2008

  8. Actually, the Ruby ‘globals’ related to regular expression match results aren’t as different from this as you might think. The really are frame local variables and they can be captured by a closure.

    http://talklikeaduck.denhaven2.com/articles/2008/11/17/in-ruby-globals-arent-always-global

    November 17th, 2008

  9. i know it seems to be a kind of regex blasphemic but im my opinion

    #/(.*?)=>(first) (.*?)=>(second)/

    looks and reads nice too :)

    of course the “special case” of matching exactly foo=>bar would need escaping then…
    and of course this needs to be hacked in the regex parser
    and … just playing around …

    November 18th, 2008

  10. Stefano:

    Yeah, that’s ugly, but it works. Except that it will set the variable in the outer scope, of course.

    Paul:

    At this point I won’t introduce new regexp constructs. =)

    Rick:

    Yeah, I know that it’s not strictly global – except that they are always available. And the implementation need to care about them in loads of different places. I didn’t know you can close over them, though. The problem with them is that they are a very specific solution to just regexps. The it-variable can be used for anything.

    November 18th, 2008

  11. Ok, this took me a while to get my head around, but given the very liberal definition if truth that floats around I think this could have wide ranging applications.

    I’m still having a hard time with the new key word. ‘it’ seem like an attractive nuisance in DSLs. I don’t know ioke syntax well, but is there a way it could become a block parameter or and assignment. In ruby something like this would be cool.

    if (foo =~ /(.?) (.?)/) {|match|
    puts “first element: #{match[1]}”
    end

    November 18th, 2008

  12. In functional languages you probably would use the option type.

    In scala:

    scala> import java.util.regex._
    import java.util.regex._

    scala> def matches(regex:String,str:String) = {
    | val matcher = Pattern.compile(regex).matcher(str)
    | if (matcher.matches) Some(matcher) else None
    |}
    matches: (String,String)Option[java.util.regex.Matcher]

    scala> def printBothElements(input:String) =
    | matches(“(.*?) (.*?)”,input).foreach(it =>
    | System.out.println(“first: ‘”+it.group(1)+”‘ second: ‘”+it.group(2)+”‘”))
    printBothElements: (String)Unit

    scala> showBothElements(“what else”)
    first: ‘what’ second: ‘else’

    scala> printBothElements(“blub”)

    November 18th, 2008

  13. matthias

    Why has it to be ‘it’? You could just give it a name yourself. You only have to be able to introduce new variables in the if condition that are local to that particular if branch. Than Stefano’s example would work as expected:

    str = “foo bar”
    if(it = #/(.*?) (.*?)/ =~ str,
    “first element: #{it[1]}” println
    “second element: #{it[2]}” println)

    A you said this could be used for other things as well:

    str = “123”
    if (asNumber = str asNumber,
    “#{str} as number is #{asNumber}” println,
    “Seems like #{str} is not a number” println)

    ‘asNumber’ is only visible in the if branch.

    November 18th, 2008

  14. ste

    A really primitive (and ugly) implementation in Ruby:
    http://gist.github.com/26120

    November 18th, 2008

  15. Hi Ola,

    Have you though about providing access to an enclosing it-variable?
    So that if you have (for example) an if nested inside an if you will be able to access the outer it-variable from the scope of inner if?

    This could be implemented as a method attached to the object when it is assigned into the it-variable so that the expression it.outer (excuse me for sticking with the dot-notation…) will provide this access?

    Otherwise, maybe this kind of idiom is too seldom to justify such a mechnaism?

    November 18th, 2008

  16. matthias

    My proposal would solve the problem of nesting if’s

    if (outerIt = outerCondition,
    if (innerIt = innerCondition, …, …),
    …)

    November 18th, 2008

  17. ab5tract

    Is there some reason you don’t like named matching? Ruby 1.9 seems to have thought this through pretty well. And in the case of not wanting to explicitly name different elements of the match I think Antares’ suggestion looks really good.

    Also, Perl 6 grammars may be of interest. Perl “defined” modern regex’s in many ways, and grammars are going to do it again.

    November 18th, 2008

  18. ab5tract

    I should clarify that I realize that what we are discussing here is a slightly different situation than named matching. It’s just that every time I see the global var pattern in Ruby I say to myself “jeez I can’t wait to be doing everything in 1.9”.

    I really like the

    if (foo =~ /(.?) (.?)/) {|match| puts “first element: #{match[1]}”}

    suggestion, but I’m not familiar enough with ioke to say whether this is even possible. (Aside: Do arrays in Ioke start at 1 instead of 0?)

    November 18th, 2008

  19. stefano

    @ab5tract: MatchData objects store the matched string at index 0, and each subgroup starting at index 1

    November 18th, 2008

  20. stefano

    A simple implementation in Io:

    aif_context := method(value,
    ctx := Object clone
    ctx it := value
    ctx
    )
    aif := method(cond,
    if(cond, call argAt(1) doInContext(aif_context(cond)))
    )

    Regex
    aif(“one two” findRegex(“(.+?) (.+)”), “first: #{it at(1)}” interpolate println)

    I’m not entirely sure how it works, but it works (or at least I think so) :-)

    November 18th, 2008

  21. ab5tract

    @stefano – Ah, I did not realize Ioke would do it this way but in hindsight I should have recognized that pattern. I don’t particularly like it, maybe that’s what colored my vision.

    To me it makes more sense to put the argument string into the last element of the MatchObject. This way it’s one less edge case index offset to remember.

    November 20th, 2008

  22. I’ve always felt that Lisp style aif macros were an admission that lambdas were syntactically too heavy in most Lisp variants. As STE’s Ruby version shows, with easy lambdas it’s easy to create your own form of aif without using macros.

    If you really want aif, I’d definitely go with naming the variable as Matthias suggested. Magic names like “it” will just lead you down the path to Perl. :-)

    To follow up on what Johannes wrote, here’s some Scala that leads to something more nearly like what you’re writing (though it still uses Option types – it could be changed to have some notion of “turthy” vs “falsey” values like Lisp and Ruby and such, but that kind of punning doesn’t seem to fit statically typed languages very well.

    // first some setup
    implicit def stringToMatchString(str : String) = new {
    def =~(pattern : String) = {
    val matcher = Pattern.compile(pattern).matcher(str)
    if (!matcher.matches) None else {
    def results(n: Int) : List[String] =
    if (n > matcher.groupCount) Nil else matcher.group(n) :: results(n + 1)
    Some(results(0))
    }
    }
    }

    def aif[In, Out](cond : Option[In])(f : In => Out) = cond map f

    // and now we can use all that machinery to get something pretty nice

    aif(“foo bar” =~ “(.+) (.+)”) {it =>
    println(“first: ‘” + it(1)+”‘ second: ‘” + it(2) + “‘”)
    }

    The Scala version suffers from two minor issues as compared with many scripting languages: 1) no regex literals (they’re just strings) and 2) no variable expansion in string literals hence a bunch of concatenation in the println. On the plus side, I’ve made aif return something useful so I don’t have to print a result, I can use it in a larger expression
    val result = aif(“foo bar” =~ “(.+)(.+)”){it =>
    “first: ” + it(1) + ” second: ” + it(2) + + “‘”
    } getOrElse “”

    So advice #2: make if an expression, whether it’s anamorphic or not.

    November 20th, 2008

  23. Occam

    not quite what you mean with local scopes and all, but why not just allow inlined variables (a la Paul’s comment above), but using syntax of
    oniguruma (the regexp package that ruby is switching to for 1.9)
    http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt (part 7)
    (?subexp), (?’name’subexp): define named group

    December 10th, 2008

Reply to “The magic it variable in if, or solving regular expressions in Ioke”