-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a sanctioned way to reference a code point with its official name? #91
Comments
I have a fix inbound for 5 and 6 digit hexadecimal codes, as well as a fix to correctly disallow Currently the |
The |
👍
Thanks for the clarification (which probably belongs in the README as well). That kind of generality can certainly be convenient, but it comes with costs (such as there being no reliable way for machines to consume |
I can probably make |
I have a fix for this in #92, although it does not include the more stringent parsing for |
The fix in #92 now includes some validation for |
#92 now includes all of the rules you suggested. |
The README mentions two ways to reference a Unicode code point, but fails to adequately specify them:
grammarkdown.grammar doesn't mention the latter at all, and implicitly defines the former as one or more non-
<
non->
non-|LineTerminator| code points in between<
and>
. As for the implementation, scanner.ts usesscanString(CharacterCodes.GreaterThan, …)
, which pays special attention only to line terminators and>
—and in particular allows<
when represented as a character reference like<
in e.g.scanner.ts also handles the second form upon encountering "U+" or "u+" followed by four hexadecimal digits (and notably not working for supplementary-plane characters such as U+1D306 TETRAGRAM FOR CENTRE "𝌆").
This is relevant because I want to express a nonterminal like
<U+2212 MINUS SIGN>
, which is not clearly valid or invalid according to documentation here and accepted by ecma262build:spec
while being rejected by esmeta (cf. tc39/ecma262@cc5e203 and https://github.com/tc39/ecma262/actions/runs/5270397258/jobs/9529840136?pr=3098 ).Ideally, we'd end up with alignment between documentation and implementation on a form that represents a single code point in any Unicode plane by its hexadecimal value plus descriptive explanatory text (generally its name in the Unicode Character Database), e.g.
The text was updated successfully, but these errors were encountered: