Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNF Monocle Party #34

Closed
wants to merge 7 commits into from
Closed

EBNF Monocle Party #34

wants to merge 7 commits into from

Conversation

skull-squadron
Copy link

Top hats included at no extra charge.

@postmodern
Copy link
Contributor

👍 helps stop parser bugs.

@skull-squadron
Copy link
Author

I looked at the ISO scanf format before writing the EBNF... the spec can become bloated quickly.

Update: if anyone has a copy of the C99 spec, check section 7.19.6.1

@skull-squadron
Copy link
Author

Morr tests (and specificity) needed:

.0        # float
-.0       # float
0.        # float
-0.       # float
0.0       # float
-0.0      # float
-.0e0     # float
-0.0e1    # float
-0.0e01   # float
-infinity # float
INF       # float
nan       # float
-qNaN     # float
snan      # float
NAN       # float

-0        # integer
0         # integer
10000     # integer
-100000   # integer

-01       # invalid
-.e0      # invalid
-0.0e     # invalid
-.e.      # invalid
-0e.      # invalid
-e.       # invalid
-e        # invalid

Example design decision dilemmas: Is + allowable if going down the strtod() / scanf() route? 1.#INF formats? (rabbit hole-edge case-itis)

@aaronblohowiak
Copy link
Contributor

This is the (yertl-formatted) one I came up with:

https://github.com/aaronblohowiak/toml/blob/master/toml/toml.yrt

I don't catch a few of the character escapes in the grammer, those will be considered errors by the parser (\r and \0, namely).

I believe that +, INF, nan and scientific notation are explicitly disallowed.

Also, your EBNF does not catch spaces between KEY and '=', if I am reading it correctly.

@aaronblohowiak
Copy link
Contributor

Your EBNF also would not allow empty arrays, if i am reading it correctly.

@C0mkid
Copy link
Contributor

C0mkid commented Feb 24, 2013

There shouldn't be empty arrays as per #30

Barry Allard added 2 commits February 23, 2013 23:47
Whitespace is not included in productions because it makes a mess.
// is a comment.
@skull-squadron
Copy link
Author

@aaronblohowiak Look at the pseudo terminals named WHITESPACE and ARRAY_WHITESPACE.

As a general observation: a sensibly-robust parser will have a 'pedantic parse for explicit conformity' mode and an 'try hard to parse liberally and ignore all garbage' (default) mode. During scanner/parser design, T/NT transitions should make sensible choices.

Post Script: Let's try not introduce obscure parser/generators semantics no one's familiar with or too far removed from realizability... they tend to create unnecessary hurdles to understanding and complicate implementations respectively. I've heard Bison with Flex, JavaCC may be good places to start if using codegen'ed scanner tables LL(k) / LALR(n<5)

@aaronblohowiak
Copy link
Contributor

@C0mkid #30 only says no explicit values, it does not say that empty arrays are disallowed. For instance a = [] is different (in some languages) from a = nil.

@aaronblohowiak
Copy link
Contributor

@steakknife ah, was not familiar with the pseudo-terminals (my brain just skipped the comment-looking part)

I dont think anyone was introducing obscure parser/generator semantics.. I did not mean to suggest the file I linked to should be taken as authoritative, just as an alternative.

@C0mkid
Copy link
Contributor

C0mkid commented Feb 24, 2013

@aaronblohowiak As this is a config language, I guess the parser can just make the assumption that the key not existing is the same as empty array/null/nil.

@aaronblohowiak
Copy link
Contributor

@C0mkid then how would you express a = [[1,2,3], [1,4], [], [4,5,6]

@C0mkid
Copy link
Contributor

C0mkid commented Feb 24, 2013

@aaronblohowiak ah, I was more thinking on the lines of top level stuff like a = [] which wouldn't make that much of a difference, but now I see what you mean.

@skull-squadron
Copy link
Author

@aaronblohowiak Ah no worries. It's good to have a second clean-room implementation as a CRC.

@skull-squadron
Copy link
Author

@C0mkid, @aaronblohowiak I'm thinking per no nulls that empty sets would be also disallowed to avoid edge-cases similar to null.... Might need @mojombo to weigh in.


KEY = [^\.]+
KEYGROUPNAME = KEY ( '.' KEY )*
STRING = '"' ([^\"\\]|'\\'[0tnr"\\])* '"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a ' quote mark missing at end of the STRING line?

@kelvinst
Copy link

This seems to enter on stagnancy, but anyway, I'll express my opinions.

When I read the TOML specification I take attention first to one thing that @mojombo says: "Because we need a decent human-readable format that maps to a hash and the YAML spec is like 80 pages long and gives me rage". So, if the intention of @mojombo is keep the specification the more simple to read that we could, maybe this PR breaks this intention a little.

I understand that this can be veeery helpfull to stop parser errors between the many implementations of TOML, but put it in the general specification may not be the best way.

I suggest to think about in put this in other file and link it in a section of the README file (maybe a "Development" section)

@parkr
Copy link

parkr commented Aug 23, 2013

I agree that this suggestion breaks the simplicity and human-readability of both the spec and of TOML files. I'm 👎 on this PR.

@postmodern
Copy link
Contributor

I understand that this can be veeery helpfull to stop parser errors between the many implementations of TOML, but put it in the general specification may not be the best way.

TOML was created in the wake of the YAML vulnerabilities. It would seem that TOML should also ensure that implementations are correct and secure. English text is a horrible medium to describe subtle details, which leads to developers coming up with their own interpretations which leads to bugs, incompatibilities and sometimes vulnerabilities (eg: BlueTooth). If the EBNF specification has gotten to bit, then this might indicate that TOML has gotten to complex.

We gain nothing by rejecting a formal specification.

@BurntSushi
Copy link
Member

@postmodern Absolutely. The only debate here should whether the EBNF and the English version of the spec are in agreement.

@skull-squadron
Copy link
Author

@BurntSushi @postmodern Yup.

TL;DR: Formal and informal are useful for different audiences.

Formal spec is necessary for implementers and neck-beard users. Informal is great for new users and to make sure the formal is in keeping with reality; divergence of the two as mentioned is bad news. Put another way, it's another "model" language to describe the same behavior in a different way.

For the formal part, one alternative to EBNF is to have a reference implementation in something like clear, simplistic C that accounts for all the corner cases that have been raised and decided upon. For any questions, whatever the reference implementation does may be viewed as the SSOT (single source of truth). Either way, it will change over time.

English is the language of business because of ambiguity. An English spec is helpful for new users and those unfamiliar with more formal specifications since not everyone can think in formal methods.

As an example of another codec that is easy, fast to understand (not necessarily to implement or unambiguity):

  1. json website - English and semi-formal together
  2. jsmn parser in 250 lines that does most things right https://bitbucket.org/zserge/jsmn/src/09ad3d4559ea3e5b1b93cd90350e1909534635dd/jsmn.c?at=default

Postscript: If you were not Musk and wanted to design Hyperloop's control systems, here's some tips from NASA:

@kelvinst
Copy link

@postmodern, as I say, I agree that english is not the best languages for the develop the implementarions of TOML, but one of the intentions of @mojombo with TOML is a simple and direct specification as I commented before

The question is: this is the best place to the EBNF? I thought in another file to this more specific specification and a link to it in the README

This only, I'm pretty sure that this EBNF is very important and need to be accessible by this specification, but not sure about this is the best place.

English is the language of business because of ambiguity.


Reply to this email directly or view it on GitHub.

@postmodern
Copy link
Contributor

@kelvinst RFCs usually put the EBNF in a separate section. Perhaps this would be a good time to break down the README into separate files?

@skull-squadron
Copy link
Author

ZOMGCOPTER @ bikeshedding. Organization is orthogonal to whether a formal specification is useful.

If this PR were useful, it would've been merged. Since it has not to this point, all debates as such are moot anyhow.

Time to kick people out by stopping the music and cutting off the booze. Handing 'em trash bags usually gets 'em moving out pretty fast. : )

@postmodern
Copy link
Contributor

Time to kick people out by stopping the music and cutting off the booze. Handing 'em trash bags usually gets 'em moving out pretty fast. : )

@steakknife Why didn't you merge the EBNF specification as it is? Having a EBNF specification would legitimize TOML. Without a formal specification that can be verified, TOML is a toy format.

@BurntSushi
Copy link
Member

@postmodern - @steakknife is the one who opened the pull request, so he can close it. I have no idea why he closed it. The reasons he gave were pretty mysterious, particularly given that @mojombo has been allowing TOML to simmer for several months on its own anyway.

I'd recommend that you re-open a pull request of your own :-)

@kelvinst
Copy link

Heelp! A man fainted on his keyboard!

@parkr
Copy link

parkr commented Sep 1, 2013

@kelvinst 😆 👍

@skull-squadron
Copy link
Author

TL;DR - Please stop.

The updated spec was committed but  this pull was clearly never in an acceptable state and there was zero feedback to rectify it.   So yes, it's closed because I'm done wasting my time.






  Like the Olympics certifying the rules after it were over, feel free to stay and argue, but everyone else has already gone home.








Fork and pull if you want, I'm unwatching and ignoring this now.


Sent from Mailbox for iPhone

On Sun, Sep 1, 2013 at 10:36 AM, Parker Moore <[email protected]="mailto:[email protected]">> wrote:

@kelvinst 


Reply to this email directly or view it on GitHub.

@BurntSushi
Copy link
Member

@steakknife Yes yes. You've made your lack of interest clear. No need to repeat yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants