How many s-expression formats are there for Ruby?
Posted by matijs 04/11/2012 at 13h34
Once upon a time, there was only UnifiedRuby, a cleaned up representation of the Ruby AST.
Now, what do we have?
-
RubyParser before version 3; this is the UnifiedRuby format:
RubyParser.new.parse "foobar(1, 2, 3)" # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
-
RubyParser version 3:
Ruby18Parser.new.parse "foobar(1, 2, 3)" # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3)) Ruby19Parser.new.parse "foobar(1, 2, 3)" # => s(:call, nil, :foobar, s(:lit, 1), s(:lit, 2), s(:lit, 3))
-
Rubinius; this is basically the UnifiedRuby format, but using Arrays.
"foobar(1,2,3)".to_sexp # => [:call, nil, :foobar, [:arglist, [:lit, 1], [:lit, 2], [:lit, 3]]]
-
RipperRubyParser; a wrapper around Ripper producing UnifiedRuby:
RipperRubyParser::Parser.new.parse "foobar(1,2,3)" # => s(:call, nil, :foobar, s(:arglist, s(:lit, 1), s(:lit, 2), s(:lit, 3)))
How do these fare with new Ruby 1.9 syntax? Let’s try hashes. RubyParser before version 3 and Rubinius (even in 1.9 mode) can’t handle this.
-
RubyParser 3:
Ruby19Parser.new.parse "{a: 1}" # => s(:hash, s(:lit, :a), s(:lit, 1))
-
RipperRubyParser:
RipperRubyParser::Parser.new.parse "{a: 1}" # => s(:hash, s(:lit, :a), s(:lit, 1))
And what about stabby lambda’s?
-
RubyParser 3:
Ruby19Parser.new.parse "->{}" # => s(:iter, s(:call, nil, :lambda), 0, nil)
-
RipperRubyParser:
RipperRubyParser::Parser.new.parse "->{}" # => s(:iter, s(:call, nil, :lambda, s(:arglist)), # s(:masgn, s(:array)), s(:void_stmt))
That looks like a big difference, but this is just the degenerate case. When the lambda has some arguments and a body, the difference is minor:
-
RubyParser 3:
Ruby19Parser.new.parse "->(a){foo}" # => s(:iter, s(:call, nil, :lambda), # s(:lasgn, :a), s(:call, nil, :foo))
-
RipperRubyParser:
RipperRubyParser::Parser.new.parse "->(a){foo}" # => s(:iter, s(:call, nil, :lambda, s(:arglist)), # s(:lasgn, :a), s(:call, nil, :foo, s(:arglist)))
So, what’s the conclusion? For parsing Ruby 1.9 syntax, there are really only two options: RubyParser and RipperRubyParser. The latter stays closer to the UnifiedRuby format, but the difference is small.
RubyParser’s results are a little neater, so RipperRubyParser should probably conform to the same format. Reek can then be updated to use the cleaner format, and use either library for parsing.