You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#81 made some parsing functions non-public (ones which weren't useful as part of the public API), but we left the tdl.lex() and tdl.tokenize() functions alone, even though they probably have limited utility outside of the tdl module. These functions can be deprecated if we adopt some changes to the API such that comments can be retrieved. Here is one proposal:
drop tokenize() and its regex
make lex() non-public (_lex()) and have it lex every token, not just the top-level constructions
let parse() specify which entities it will yield (by default, for example, ('typedef', 'typeaddendum', 'instance')), such that 'comment' can be included and thus yielded
For actual TDL parsing (with unification, etc.) and not just inspection, a load() function could take all the parsed entities (ignoring comments) and construct the type hierarchy and return some kind of compiled namespace object. There might be separate load_types() and load_instances() functions, like in the LKB, that restrict what kinds of entities can be parsed in a file.
The text was updated successfully, but these errors were encountered:
The underlying parsing logic will change dramatically, so maybe leave parse() as-is and deprecate it and add a new iterparse() function for the new logic. The load() functions can then make use of the new iterparse() function.
This lexer relies on the group identifiers of a regular expression,
and it has support for multiline patterns (comments and docstrings),
which are parsed separately from the regex (the regex then picks up
where the special parser stops). Yielded tokens include the group
identifier of the current and next tokens (helping with lookahead in
parsing), the token text, and its line number.
Addresses #167 and #168
This adds a lot of code to tdl.py, although much of the old stuff will
be removed in a future release. The new-style parsing is ~36% slower
at reading the ERG's lexicon, but it is better able to deal with
malformed TDL, and it handles docstrings and comments in all valid
places.
Addresses #153, #167, #168, and #170
#81 made some parsing functions non-public (ones which weren't useful as part of the public API), but we left the
tdl.lex()
andtdl.tokenize()
functions alone, even though they probably have limited utility outside of thetdl
module. These functions can be deprecated if we adopt some changes to the API such that comments can be retrieved. Here is one proposal:tokenize()
and its regexlex()
non-public (_lex()
) and have it lex every token, not just the top-level constructionsparse()
specify which entities it will yield (by default, for example,('typedef', 'typeaddendum', 'instance')
), such that'comment'
can be included and thus yieldedThe
parse()
function thus behaves something like Python'sElementTree.iterparse()
For actual TDL parsing (with unification, etc.) and not just inspection, a
load()
function could take all the parsed entities (ignoring comments) and construct the type hierarchy and return some kind of compiled namespace object. There might be separateload_types()
andload_instances()
functions, like in the LKB, that restrict what kinds of entities can be parsed in a file.The text was updated successfully, but these errors were encountered: