Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Grammar Fails: Cannot read field "grammarType" because "this.atn" is null #25

Open
aeonik opened this issue Nov 29, 2022 · 4 comments

Comments

@aeonik
Copy link

aeonik commented Nov 29, 2022

Hello,
I am attempting to run the following code snippet:

(def python-file (slurp "convert_logseq_md_to_org.py"))

(let [g (antlr/parser
         "./external/parsers/grammars-v4/python/python3/Python3Lexer.g4"
         "./external/parsers/grammars-v4/python/python3/Python3Parser.g4"
         {:root "file_input"})]
  (antlr/parse g python-file))

Upon running this code snippet, I get the following error message:
Cannot read field "grammarType" because "this.atn" is null

There is no Augmented Transition Network set anywhere in the Parser Grammar; only in the Lexer Grammar.

I have debugged the code, and the lexer and parser seem to be set properly within the Clojure code, however once the grammar is passed into ANTLR4 Grammar code, specifically the following section:

Excerpt from: https://github.com/antlr/antlr4/blob/4.11.1/tool/src/org/antlr/v4/tool/Grammar.java#L1345

/** @since 4.5.1 */
public GrammarParserInterpreter createGrammarParserInterpreter(TokenStream tokenStream) {
	if (this.isLexer()) {
		throw new IllegalStateException("A parser interpreter can only be created for a parser or combined grammar.");
	}
	// must run ATN through serializer to set some state flags
	IntegerList serialized = ATNSerializer.getSerialized(atn);
	ATN deserializedATN = new ATNDeserializer().deserialize(serialized.toArray());

	return new GrammarParserInterpreter(this, deserializedATN, tokenStream);
}

It seems to be expecting an ATN to be bundled with the Parser Grammar.

The referenced ANTLR code above is called from clj-antlr in the singlethreaded-parser function in the clj-antlr/interpreted namespace.

(defn singlethreaded-parser
  "Creates a new single-threaded parser for a grammar."
  [^LexerGrammar lexer-grammar ^Grammar grammar]
  (let [^Lexer lexer (.createLexerInterpreter
                       lexer-grammar (common/char-stream ""))
        parser       (.createGrammarParserInterpreter          ;; Exception is thrown to us somewhere around here
                       grammar (common/tokens lexer))]         
    (SinglethreadedParser. grammar lexer parser)))

I have attached several screenshots that have more detailed stack traces and debug views from Cursive IDE.

I will continue investigating potential solutions, but I am curious if I am simply doing something wrong, or if maybe ANTLR has made a breaking change on their end.

Appreciate any insights, and thank you for your wonderful project!

Regards,
Aeonik

clj-antlr-bug

clj-antlr-bug2

clj-antlr-lexer-atn

@aphyr
Copy link
Owner

aphyr commented Nov 29, 2022 via email

@ghost
Copy link

ghost commented Dec 7, 2022

I've dug into this for a bit - I've run into the same (or similarly-looking) issue when trying to parse a kotlin source. I think it boils down to Tool.processNonCombinedGrammar aborting early if there are errors and never setting atn on the input Grammar. In my case, there was a complaint in logs about missing "tokens" file. After manually generating that, parse went as expected.

@fdhenard
Copy link

@mpevnev, How did you generate the tokens file? I'm having the same problem trying to parse C#.

@ghost
Copy link

ghost commented Dec 24, 2022

Downloaded antlr4-4.9.3-complete.jar and run java -jar antlr4-4.9.3-complete.jar KotlinLexer.g4 in the directory with grammars. It produces some other files alongside the tokens file, maybe it's possible to disable that, not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants