not a compliant parser #4

BurntSushi · 2013-03-04T03:31:33Z

I've created an interface for your parser to work with toml-test, which automatically tests your parser with a whole bunch of test cases. (I'm at 50 right now.)

Your parser fails quite a few tests.

(One of the tests I know is bogus because of floating point stuff. I'm trying to figure out how to fix that.)

uiri · 2013-03-04T04:52:27Z

Thank you, I intended to do this at some point. Some of the errors seem to have to do with errors in your code translating from my lib to a format which your tester understands (e.g. arrays without opening brackets, the UnicodeEncodeError). Some of them have to do with being too liberal with the input that my parser accepts. Others have to do with changes I made and weren't adequately tested (if they were adequately tested they wouldn't cause test failures, obviously). I'm not sure how much I ought to enforce homogeneity in arrays even though it is in the spec because toml is primarily for configuration rather than data interchange.

BurntSushi · 2013-03-04T05:03:40Z

Thank you, I intended to do this at some point.

No problem. I'm trying to get people moving and getting their parsers tested :-)

Some of the errors seem to have to do with errors in your code translating from my lib to a format which your tester understands (e.g. arrays without opening brackets, the UnicodeEncodeError)

The only such test is string-byte-escapes. I've double-checked the other tests, and your parser still fails them. A gist that doesn't use the automated program.

I'm not sure how much I ought to enforce homogeneity in arrays even though it is in the spec because toml is primarily for configuration rather than data interchange.

Why would you release a "parser for TOML" and purposefully not be compliant with the spec in such a big way? Homogeneity isn't just for data exchange, it's to ensure well-typed structured data. This is especially useful for static languages. Python isn't a static language, but that doesn't mean dynamic languages get its own version of the TOML spec. Moreover, true homogeneity is hopefully coming soon.

If you won't implement the spec, I suggest that you make that clear somehow.

uiri · 2013-03-04T05:53:28Z

The only such test is string-byte-escapes. I've double-checked the other tests, and your parser still fails them. A gist that doesn't use the automated program.

Yes, I realize now that I've been chopping off the opening brackets in arrays. How embarassing. I fixed that, will push a commit to github in a second.

Why would you release a "parser for TOML" and purposefully not be compliant with the spec in such a big way? Homogeneity isn't just for data exchange, it's to ensure well-typed structured data. This is especially useful for static languages. Python isn't a static language, but that doesn't mean dynamic languages get its own version of the TOML spec. Moreover, true homogeneity is hopefully coming soon.

If you won't implement the spec, I suggest that you make that clear somehow.

This isn't some sticking point for me where I disagree with the spec and will refuse to enforce type homogeneity in arrays but rather, due to the way I threw together the parser, adding in enforcement for homogeneous arrays is something which I need to be careful about implementing. If someone is going to beat me to the punch and send in a pull request, I'll gladly take it :). I'll see if I can get around to implementing it this week or next regardless of other people's contributions. Especially if tuples make their way into the spec.

No longer chop off opening brackets for nested arrays Ignore trailing backslashes

BurntSushi · 2013-03-04T06:04:10Z

@uiri Ah, fair enough. I misinterpreted your intentions! I've already written an article that I intend to publish if and when true homogeneous arrays/tuples lands that talks about type checking everything (including the elusive empty list). Hopefully that will help.

Also, I've updated the original gist with results from the latest pull. I've also fixed the floating point bug (temporarily). Things are looking better :-)

Treat keynames as keygroups with respect to # Fixed bug with regards to escaping backslashes

uiri · 2013-03-04T06:25:51Z

I'm fairly certain that duplicate keys and empty keygroups are allowed. For the former, the latter key takes precedence and for the latter, the result is supposed to be an empty hash. The implicit-and-explicit-after seems to me to be a nontrivial fix like keeping track of the type of arrays. I'm not sure why string no close isn't raising an error and key-no-whitespace looks like something which should be allowed but I guess the spec doesn't agree with me on that one. Maybe some clarification is in order… I'll put in some more fixes tomorrow if I find the time. I think I touched on or fixed all the tests I was failing?

BurntSushi · 2013-03-04T06:33:19Z

From the spec:

Be careful not to overwrite previous keys. That's dumb. And should produce an error.

Duplicate keys aren't allowed.

empty keygroups are allowed

This is possible. The spec doesn't clarify it. I believe there are some open issues regarding it. I might remove the test until it's clarified. But since duplicate keys are allowed, there could only be one empty keygroup.

The implicit-and-explicit-after seems to me to be a nontrivial fix

Not sure. You just need to keep track of whether a hash was created implicitly or explicitly. Sounds like an extra dictionary (or even better, a set) to me.

I think I touched on or fixed all the tests I was failing?

Mostly. There are a few others: string-escapes (something weird is going on there), key-with-pound, key-special-chars, float-no-leading-zero and float-no-trailing-digits.

Thanks for your patience and helping me get the kinks worked out of toml-test!

uiri · 2013-03-04T20:40:19Z

From the spec:

Be careful not to overwrite previous keys. That's dumb. And should produce an error.
Duplicate keys aren't allowed.

I was under the impression that what is forbidden by that is strictly that example - overwriting a key with a keygroup. I can't find it now for some reason, but I could have sworn I saw it clarified that in a badly done config file, the latter version of a key overrides the previous version.

empty keygroups are allowed
This is possible. The spec doesn't clarify it. I believe there are some open issues regarding it. I might remove the test until it's clarified. But since duplicate keys are allowed, there could only be one empty keygroup.

Maybe the spec should be editted to clarify this, but see toml-lang/toml#30 (comment)

The implicit-and-explicit-after seems to me to be a nontrivial fix

Not sure. You just need to keep track of whether a hash was created implicitly or explicitly. Sounds like an extra dictionary (or even better, a set) to me.

Yes, another thing to track in a parser which is relatively stateless…

I think I touched on or fixed all the tests I was failing?

Mostly. There are a few others: string-escapes (something weird is going on there), key-with-pound, key-special-chars, float-no-leading-zero and float-no-trailing-digits.

I fixed those except the floats. I need to add some checks in for them, I think, but it doesn't seem to be a big deal.

Thanks for your patience and helping me get the kinks worked out of toml-test!

Thank you for writing it; it is a lot more thorough than my own simple testing.

BurntSushi · 2013-03-04T20:48:36Z

See the end of issue #81. Part of the problem with file inclusion was one of keys being overwritten. mojombo seemed to implicitly reinforce this notion.

Maybe the spec should be editted to clarify this, but see [mojombo's comment here.]

An empty key group can only be represented as []. That particular issue is talking about a named keygroup without any keys.

Yes, another thing to track in a parser which is relatively stateless…

Part of the problem might be the combination of lexing and parsing into one process. A finite state machine (a lexer) works great for picking out syntactic categories, while a parser can use that information to enforce global constraints like duplicate keys, type restrictions, etc.

uiri · 2013-03-04T21:44:44Z

See the end of issue #81. Part of the problem with file inclusion was one of keys being overwritten. mojombo seemed to implicitly reinforce this notion.

Not sure. I think it was that the included file would go under a keygroup which would introduce another possibility of overwriting a key with a keygroup.

An empty key group can only be represented as [] . That particular issue is talking about a named keygroup without any keys.

I think you mean {} . Is my parser representing empty keygroups as something else…?

All the tests should now be passing except for the implicit-and-explicit, the mixed arrays and the empty keygroup ones.

BurntSushi · 2013-03-04T21:50:35Z

I think you mean {} . Is my parser representing empty keygroups as something else…?

I don't know. My tests check that [] is rejected but that [a] with no keys defined is valid, where a maps to an empty hash. I assumed that you were referring to [] initially.

The latter is absolutely correct according to the current spec. The only point of debate here is whether [] is allowed as a valid keygroup. Namely, are keygroups with zero-length names allowed? I don't know. My guess is no. (Why? Because it's stupid.)

uiri · 2013-03-05T03:55:18Z

All of the tests except the implicit and explicit after should be passing now...

BurntSushi · 2013-03-05T03:58:20Z

Nice work :-)

Although this is what I get after a pull:

[andrew@Liger uiri-toml] toml-test ./toml-test.py 
Test: empty-implicit-keygroup (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-neg-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: implicit-and-explicit-after (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 68, in loads
    raise Exception("What? "+group+" already exists?"+str(currentlevel))
Exception: What? a already exists?{'a': {'b': {'c': {'answer': 42}}}}

-------------------------------------------------------------------------------
Test: key-special-chars (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 75, in loads
    raise Exception("Missing whitespace between key name and =")
Exception: Missing whitespace between key name and =


48 passed, 5 failed

uiri · 2013-03-05T04:34:48Z

[andrew@Liger uiri-toml] toml-test ./toml-test.py Test: empty-implicit-keygroup (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------

Fixed this just now :)

Test: overflow-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------
Test: overflow-neg-int (invalid)

Expected an error, but no error was reported.
-------------------------------------------------------------------------------

These are bogus tests. The spec only specifies integer size as 64-bit minimum. In fact, I think these should be in the valid tests to make sure people aren't using 32-bit integers.

Test: key-special-chars (valid)

Traceback (most recent call last):
  File "./toml-test.py", line 36, in <module>
    tdata = toml.loads(sys.stdin.read())
  File "/home/andrew/clones/uiri-toml/toml.py", line 75, in loads
    raise Exception("Missing whitespace between key name and =")
Exception: Missing whitespace between key name and =

I'm just incredulous that equal signs are allowed in key names. It just never occured to me how wacky Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign. could be…

BurntSushi · 2013-03-05T05:10:51Z

These are bogus tests. The spec only specifies integer size as 64-bit minimum. In fact, I think these should be in the valid tests to make sure people aren't using 32-bit integers.

!!! I missed the word "minimum" in the spec. Very nice catch. Thank you :-)

@BurntSushi

Does everything look good @BurntSushi?

BurntSushi · 2013-03-05T21:34:01Z

Wow nice work!

All looks well, except for #3 in toml-test. I've been convinced that

key= 1

is allowed. So your parser only fails one last test (which I added today): key-two-equals.

Keep in mind that the implicit/explicit group stuff is still open to interpretation. You don't have to listen to me. It's just my understanding and it could be wrong. We're still waiting for a clarification. (What we really need is an EBNF, and we can all stop trying to interpret the subtleties of English.)

uiri · 2013-03-05T22:55:42Z

All looks well, except for #3 in toml-test. I've been convinced that
key= 1
is allowed. So your parser only fails one last test (which I added today): key-two-equals.

Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign.

This sentence from the spec is starting to look like one of those pictures with two interpretations to me. The answer is, of course, to support both!

Keep in mind that the implicit/explicit group stuff is still open to interpretation. You don't have to listen to me. It's just my understanding and it could be wrong. We're still waiting for a clarification. (What we really need is an EBNF, and we can all stop trying to interpret the subtleties of English.)

I'm not sure… it would seem to make sense that you can define a parent keygroup after it has children. Covering more cases can't be bad, right?

BurntSushi · 2013-03-05T22:58:02Z

Covering more cases can't be bad, right?

Emphatically yes. One of the hardest parts of maintaining a project like TOML—and I don't envy @mojombo for this—is refraining from adding too much complexity.

Simplicity is the ultimate sophistication. :-)

The only whitespace needed on a key value pair is between the equal sign and the value

uiri · 2013-03-06T00:11:57Z

Can you send a pull request with your test script? Or is it OK if I include it? The license is MIT, you should probably be added to the copyright notice if your code is included.

BurntSushi · 2013-03-06T00:20:27Z

Nice! You pass all tests :-)

Or is it OK if I include it? The license is MIT, you should probably be added to the copyright notice if your code is included.

While I didn't explicitly specify a license with this test inferface, all of my hobby code is released under the WTFPL.

And it's just a 30 line test interface. It was in the gist (although my pull request might have a few tweaks since then). I used a virtually identical interface to test the rest of the Python TOML parsers :-)

In short: do whatever you want. My name in the commit logs is good enough for me.

uiri · 2013-03-06T01:13:29Z

Closing this since it passes all the tests :)

uiri added a commit that referenced this issue Mar 4, 2013

Addresses some issues raised in #4

3abea2f

No longer chop off opening brackets for nested arrays Ignore trailing backslashes

uiri added a commit that referenced this issue Mar 4, 2013

The last of the simple fixes for issues in #4

d279639

Treat keynames as keygroups with respect to # Fixed bug with regards to escaping backslashes

uiri added a commit that referenced this issue Mar 4, 2013

Fix issue with [] as keygroup name mentioned in #4

3423ed9

uiri added a commit that referenced this issue Mar 5, 2013

Add checks for homogeneous arrays as per #4

14bf202

uiri added a commit that referenced this issue Mar 5, 2013

Fix issue with detecting unbalanced quotes. Identified by issue #4

ffa0d09

uiri added a commit that referenced this issue Mar 5, 2013

Move test for empty keygroup to address issue in #4

e9aae49

uiri added a commit that referenced this issue Mar 5, 2013

Allow [ and # and = in key names as per #4

8f2d857

uiri added a commit that referenced this issue Mar 5, 2013

Fix implicit and explicit keygroups issue in #4.

fc5b03d

Does everything look good @BurntSushi?

uiri added a commit that referenced this issue Mar 5, 2013

Fully comply with latest version of toml-test per #4

869f330

The only whitespace needed on a key value pair is between the equal sign and the value

uiri closed this as completed Mar 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not a compliant parser #4

not a compliant parser #4

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 6, 2013

BurntSushi commented Mar 6, 2013

uiri commented Mar 6, 2013

not a compliant parser #4

not a compliant parser #4

Comments

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 4, 2013

BurntSushi commented Mar 4, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 5, 2013

BurntSushi commented Mar 5, 2013

uiri commented Mar 6, 2013

BurntSushi commented Mar 6, 2013

uiri commented Mar 6, 2013