JSON decoder example #23

keleshev · 2013-04-07T20:07:04Z

This is implementation of JSON decoder using parsimonious.

It's based on json spec with one exception that escape-sequences are not supported. I will most likely add them later.

erikrose · 2013-04-08T16:59:09Z

Nice job. It's impressive how short it is. I may have to borrow this as a canonical benchmark!

keleshev · 2013-04-08T17:23:09Z

The grammar ended up very similar to the spec (unlike parsley). But at the same time it was a bit tricky to make the decoder since it required to switch to the grammar a lot. I actually ended up developing it with rules inside each of the methods' docstrings, and moving it out when it was finished :-)

Also recently I presented parsimonious at a local Python meetup, showing how to write a simple interpreter. You might be interested to see it:

https://gist.github.com/halst/4531a03bcddab550992a

Too bad the camera's battery died while we were recording my talk, but I plan to make a screencast out of it.

erikrose · 2013-04-08T19:35:48Z

A couple ideas:

It would be almost trivial to create a sort of NodeVisitor class where a Grammar is built out of the concatenation of [parts of] the methods' docstrings. You'd lose the decoupling between grammar and visitor, but sometimes you don't need it.
I'm probably going to add a few select tree transforms (based on http://doc.pypy.org/en/latest/rlib.html#tree-transformations) to the grammar so at least you can extract a child or ignore a child. For instance…
```
pair = first >", "< second  # The comma gets ignored, and the visitor sees just first and second.

subexpression = "(" <important_bit> ")"  # The visitor sees only important_bit in place of subexpression.
```
Spelling is still extremely up for grabs.

Would the tree transforms help make your need to look back and forth between the visitor and grammar go away? I definitely want to solve this problem—without killing the option of decoupling the two.

erikrose · 2013-04-08T19:41:31Z

Say…it occurs to me that we can perfectly well extract a grammar from visitor docstrings without having to actually use that visitor to visit. :-) So there's no real disadvantage to doing that, except that you can't see the whole grammar at once without a little work. Hmm!

erikrose · 2013-04-08T19:49:25Z

If you get around to making a screencast, I'd love to see it or even help publicize it.

keleshev · 2013-04-08T20:08:31Z

As I can see around, this problem is usually handled by assigning names to children like parsley does:

object = ws '{' members:m ws '}' ws  # do something with `m`
pair = string:k ':' value:v  # do something with `k` and `v`

With grabbing syntax this would look like

object = ws '{' <members> ws '}' ws  # do something with `members`
pair = string >':'< value  # do something with `string` and `value`

In these cases I like grabbing better, because you don't need to come up with silly short names.

But I can imagine a problem that "naming" could handle that "grabbing" couldn't (probably?):

members = (pair:first (ws ',' pair)*:rest -> [first] + rest) | -> []

Although I'm not sure how parsley gets rest without such garbage as ws and ','.

keleshev · 2013-04-08T20:45:14Z

Well, in ideal parallel universe where Python has real lambdas, I wold love to change this code:

class Mini(object):
    ...
    def ifelse(self, node):
        """ ifelse = ~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr """
        _, cond, _, cons, _, alt = node
        return self.eval(cons) if self.eval(cond) else self.eval(alt)

    def infix(self, node, children):
        """ infix = ~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*" """
        _, left, _, operator, _, right, _ = children
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        return operators[operator](left, right)

into something like this (in CoffeeScript syntax):

mini = Gramar({
    ...
    'ifelse': rule '~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr', -> 
        mini.eval(@expr3) if mini.eval(@expr1) else mini.eval(@expr2)

    'infix': rule '~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*"', ->
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        operators[@operator](@expr1, @expr2)
})

keleshev · 2013-04-08T20:46:18Z

I.e. somehow avoid method signatures and tuple unpacking

erikrose · 2013-04-08T20:46:51Z

But I can imagine a problem that "naming" could handle that "grabbing" couldn't (probably?):

This is, I suspect, where the third type of tree transformation comes in: http://doc.pypy.org/en/latest/rlib.html#nonterminal-1-nonterminal-2-nonterminal-n. (Yes, I changed the syntax in my example; don't let it confuse you.)

erikrose · 2013-04-08T21:13:32Z

I would nudge it in this direction:

class Mini(object):
    ...
    def ifelse(self, (_, cond, _, cons, _, alt)):
        """ ~"if\s*" expr ~"\s*then\s*" expr ~"\s*else\s*" expr """
        return self.eval(cons) if self.eval(cond) else self.eval(alt)

    def infix(self, node, (_, left, _, operator, _, right, _)):
        """ ~"\(\s*" expr ~"\s*" operator ~"\s*" expr ~"\s*\)\s*" """
        operators = {'+': op.add, '-': op.sub, '*': op.mul, '/': op.div}
        return operators[operator](left, right)

That is, removing the duplicated rule names and doing tuple unpacking in the formal parameter list (though that's going away in Python 3 and will therefore require rethought).

This takes us pretty close to a PEG version of PLY, which is not entirely a bad thing, since (1) it's optional and (2) it doesn't strictly hurt our decoupling.

keleshev · 2013-04-08T21:26:36Z

Yeah, I will be missing tuples in signatures. Completely opposite direction of what I want from Python :-). (E.g. CoffeeScript even allows unpacking of objects in signatures like func = (arg1, {foo: [arg2, arg3], arg4}) -> ...).

I guess what I did with Mini is the way to go in this case 😟

erikrose · 2013-04-27T04:47:43Z

I just finished multi-line support (#19) and am now turning my attention to benchmarking and optimizing, using your JSON decoder as a starting point. I got a real kick out of you naming the entrypoint loads. :-)

keleshev · 2013-04-27T08:40:40Z

😀

keleshev · 2013-04-30T20:40:47Z

I just published the screencast I was talking about: http://www.youtube.com/watch?v=1h1mM7VwNGo

The code is here: https://github.com/halst/mini

On reddit: http://www.reddit.com/r/programming/comments/1dfn16/how_to_write_an_interpreter/

Add JSON decoder example (see erikrose#18)

ada3c1f

erikrose mentioned this pull request Apr 27, 2013

How do I attach callbacks to terminal nodes? #27

Closed

erikrose mentioned this pull request May 3, 2013

Tree transforms #29

Open

This was referenced Jul 10, 2014

Add NodeVisitor subclass which constructs default grammar from docstrings #46

Closed

Write real documentation, add Sphinx #48

Open

Could you post an example JSON decoder? #18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON decoder example #23

JSON decoder example #23

keleshev commented Apr 7, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

keleshev commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 27, 2013

keleshev commented Apr 27, 2013

keleshev commented Apr 30, 2013

JSON decoder example #23

Are you sure you want to change the base?

JSON decoder example #23

Conversation

keleshev commented Apr 7, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

keleshev commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 8, 2013

erikrose commented Apr 8, 2013

keleshev commented Apr 8, 2013

erikrose commented Apr 27, 2013

keleshev commented Apr 27, 2013

keleshev commented Apr 30, 2013