-
-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct a Lark grammar from ABNF format (RFC 5234) #318
Comments
I think it's a nice idea. I think your best bet is to create a Lark parser that reproduces the output of the GrammarLoader parser : https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L663 You might have to do some post-processing (for example, in a Transformer) to make them a perfect fit. Once you have that working, I'll add an interface to plug it in there when called with ABNF grammar. While using parser = Lark("... grammar ... ", syntax='abnf') That opens the door to adding other formats in the future. |
The output of GrammarLoader is an instance of the Grammar class with It will be necessary to convert an ABNF file to a Lark file, because there are inevitably features of Lark that are not supported in ABNF. The workflow would be 1) convert ABNF to a Lark file, then 2) tweak the Lark file to achieve the desired results. The GrammarSave function is needed both to develop an ABNF loader in the first place and to do grammar optimization once it is available. |
I don't understand what the difficulty is, and what you're trying to do to solve it.
Why does it have to go through files?
So give Lark a default (like an empty list, or whatever if appropriate) |
If there is not a two-way lossless conversion ABNF <-> Lark, then something is lost in translation. Some feature of Lark, e.g., tree shaping, simply cannot be expressed at all in ABNF. If a developer wishes to use that feature, then supporting ABNF in GrammarLoader is not a complete solution. Instead, the developer will take ABNF as a starting point. Reading an ABNF in GrammarLoader, then saving the grammar to a Lark file, allows the developer edit that file to add features to the grammar. If the updated grammar is saved in ABNF format, those features will be lost. Conformance to an ABNF specification is validated based solely on the ABNF. But there can be multiple implementations of an ABNF specification, some cleaner than others. Any feature that changes the AST to make it easier to use, but does not change the data on the wire, is a reason to be able to save an ABNF grammar in Lark format. |
In that situation, it's common to add new language features that don't break the old one. For example a new operator, that works in lark and not ABNF, but you don't have to use it.
That seems a bit cumbersome. Why not just a have an translator from Extended-ABNF into ABNF? It should be fairly easy, just removing and canonizing some nodes, and then writing it back. It's simple enough that Lark's reconstructor might even be able to handle it. |
RFC 5234 describes the standard grammar format for internet standards, such as the notoriously-hard-to-validate email addresses.
Because this standard is a dialect of EBNF and does not allow for embedded code, it should be relatively easy to construct a Lark object for a given ABNF grammar - at least easier than converting from Nearley! Hopefully it's easy enough that runtime conversion in a new
Lark.from_abnf
method (or group of methods) would be practical.This feature request is based on HypothesisWorks/hypothesis#170, where I eventually realized that parsing ABNF was going to be easier as well as more widely useful upstream. I'd be happy to work on this with some guidance about where to start, and have already translated the grammar of ABNF from ABNF to Lark's format.
The text was updated successfully, but these errors were encountered: