-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EBNF Monocle Party #34
Conversation
👍 helps stop parser bugs. |
I looked at the ISO scanf format before writing the EBNF... the spec can become bloated quickly. Update: if anyone has a copy of the C99 spec, check section 7.19.6.1 |
Morr tests (and specificity) needed:
Example design decision dilemmas: Is |
This is the (yertl-formatted) one I came up with: https://github.com/aaronblohowiak/toml/blob/master/toml/toml.yrt I don't catch a few of the character escapes in the grammer, those will be considered errors by the parser (\r and \0, namely). I believe that +, INF, nan and scientific notation are explicitly disallowed. Also, your EBNF does not catch spaces between KEY and '=', if I am reading it correctly. |
Your EBNF also would not allow empty arrays, if i am reading it correctly. |
There shouldn't be empty arrays as per #30 |
Whitespace is not included in productions because it makes a mess.
// is a comment.
@aaronblohowiak Look at the pseudo terminals named As a general observation: a sensibly-robust parser will have a 'pedantic parse for explicit conformity' mode and an 'try hard to parse liberally and ignore all garbage' (default) mode. During scanner/parser design, T/NT transitions should make sensible choices. Post Script: Let's try not introduce obscure parser/generators semantics no one's familiar with or too far removed from realizability... they tend to create unnecessary hurdles to understanding and complicate implementations respectively. I've heard Bison with Flex, JavaCC may be good places to start if using codegen'ed scanner tables LL(k) / LALR(n<5) |
@steakknife ah, was not familiar with the pseudo-terminals (my brain just skipped the comment-looking part) I dont think anyone was introducing obscure parser/generator semantics.. I did not mean to suggest the file I linked to should be taken as authoritative, just as an alternative. |
@aaronblohowiak As this is a config language, I guess the parser can just make the assumption that the key not existing is the same as empty array/null/nil. |
@C0mkid then how would you express |
@aaronblohowiak ah, I was more thinking on the lines of top level stuff like |
@aaronblohowiak Ah no worries. It's good to have a second clean-room implementation as a CRC. |
@C0mkid, @aaronblohowiak I'm thinking per no nulls that empty sets would be also disallowed to avoid edge-cases similar to null.... Might need @mojombo to weigh in. |
|
||
KEY = [^\.]+ | ||
KEYGROUPNAME = KEY ( '.' KEY )* | ||
STRING = '"' ([^\"\\]|'\\'[0tnr"\\])* '" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a '
quote mark missing at end of the STRING line?
This seems to enter on stagnancy, but anyway, I'll express my opinions. When I read the TOML specification I take attention first to one thing that @mojombo says: "Because we need a decent human-readable format that maps to a hash and the YAML spec is like 80 pages long and gives me rage". So, if the intention of @mojombo is keep the specification the more simple to read that we could, maybe this PR breaks this intention a little. I understand that this can be veeery helpfull to stop parser errors between the many implementations of TOML, but put it in the general specification may not be the best way. I suggest to think about in put this in other file and link it in a section of the README file (maybe a "Development" section) |
I agree that this suggestion breaks the simplicity and human-readability of both the spec and of TOML files. I'm 👎 on this PR. |
TOML was created in the wake of the YAML vulnerabilities. It would seem that TOML should also ensure that implementations are correct and secure. English text is a horrible medium to describe subtle details, which leads to developers coming up with their own interpretations which leads to bugs, incompatibilities and sometimes vulnerabilities (eg: BlueTooth). If the EBNF specification has gotten to bit, then this might indicate that TOML has gotten to complex. We gain nothing by rejecting a formal specification. |
@postmodern Absolutely. The only debate here should whether the EBNF and the English version of the spec are in agreement. |
@BurntSushi @postmodern Yup. TL;DR: Formal and informal are useful for different audiences. Formal spec is necessary for implementers and neck-beard users. Informal is great for new users and to make sure the formal is in keeping with reality; divergence of the two as mentioned is bad news. Put another way, it's another "model" language to describe the same behavior in a different way. For the formal part, one alternative to EBNF is to have a reference implementation in something like clear, simplistic C that accounts for all the corner cases that have been raised and decided upon. For any questions, whatever the reference implementation does may be viewed as the SSOT (single source of truth). Either way, it will change over time. English is the language of business because of ambiguity. An English spec is helpful for new users and those unfamiliar with more formal specifications since not everyone can think in formal methods. As an example of another codec that is easy, fast to understand (not necessarily to implement or unambiguity):
Postscript: If you were not Musk and wanted to design Hyperloop's control systems, here's some tips from NASA: |
@postmodern, as I say, I agree that english is not the best languages for the develop the implementarions of TOML, but one of the intentions of @mojombo with TOML is a simple and direct specification as I commented before The question is: this is the best place to the EBNF? I thought in another file to this more specific specification and a link to it in the README This only, I'm pretty sure that this EBNF is very important and need to be accessible by this specification, but not sure about this is the best place.
|
@kelvinst RFCs usually put the EBNF in a separate section. Perhaps this would be a good time to break down the README into separate files? |
ZOMGCOPTER @ bikeshedding. Organization is orthogonal to whether a formal specification is useful. If this PR were useful, it would've been merged. Since it has not to this point, all debates as such are moot anyhow. Time to kick people out by stopping the music and cutting off the booze. Handing 'em trash bags usually gets 'em moving out pretty fast. : ) |
@steakknife Why didn't you merge the EBNF specification as it is? Having a EBNF specification would legitimize TOML. Without a formal specification that can be verified, TOML is a toy format. |
@postmodern - @steakknife is the one who opened the pull request, so he can close it. I have no idea why he closed it. The reasons he gave were pretty mysterious, particularly given that @mojombo has been allowing TOML to simmer for several months on its own anyway. I'd recommend that you re-open a pull request of your own :-) |
Heelp! A man fainted on his keyboard! |
@kelvinst 😆 👍 |
TL;DR - Please stop.
— On Sun, Sep 1, 2013 at 10:36 AM, Parker Moore <[email protected]="mailto:[email protected]">> wrote:
— |
@steakknife Yes yes. You've made your lack of interest clear. No need to repeat yourself. |
Top hats included at no extra charge.