-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CharacterClass: use ICharacterClass where possible #26
Refactor CharacterClass: use ICharacterClass where possible #26
Conversation
The main rationale behind this refactoring is that the class `CharacterClassSymbol` (a subclass of `Symbol`) was being "abused". It was used for instance in the lookahead restriction of symbols and reduce actions and in `State.doReduces`/`State.addReduceAction`. Those use sites do not need to know anything related to a grammar symbol, but really only need to know which characters are in the CC. Therefore, it is better to directly use the `ICharacterClass` interface at these places. In order to accomplish this, the following changes have been made: - `union`, `intersection`, and `difference` used to be called on the `CharacterClassFactory`, and `isEmpty` could only be called on `CharacterClassSymbol`. Now, they can be called directly on `ICharacterClass`. - `toAterm` used to be a method in the `CharacterClassSymbol`. It is now also a method in `ICharacterClass`, where a more specific implementation for the "single" and "range" classes (resolving the `FIXME` comment). - This adds a dependency from `tableinterfaces` to `org.metaborg.terms`. Note that the `CharacterClass` symbol class has been renamed to `CharacterClassSymbol`, to reduce the confusion between the grammar symbol and the `ICharacterClass` data structure. As a final remark: my autoformatter has sometimes reformatted spacing and import groups, even in classes where I only changed one line. Unfortunately, this does generate some noise in the diff.
@@ -15,6 +17,8 @@ | |||
private boolean containsEOF; // [256] | |||
|
|||
private int min, max; | |||
// This field is derived from the fields wordX and containsEOF, and is therefore not used in hashCode and equals | |||
private boolean empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CharacterClassOptimized()
constructor rejects empty character classes, which I think makes sense. We don't need the field and isEmpty
can just return false
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CharacterClassOptimized()
constructor is useless, now that I look at it, since it always throws an exception 😛 I can remove that one.
However, in the constructor CharacterClassOptimized(long word0, long word1, long word2, long word3, boolean containsEOF, int min, int max)
, no exceptions are thrown if the character class is empty, so in that case the field is still necessary. Or do you think we should actually throw an exception if it is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think it makes sense to throw an exception in case an empty optimized character is instantiated. At the point of finalizing character classes, parser generation is done and non-empty character classes are useless. I'm not sure if it could happen at all, but actions for empty character classes could be left out the parse table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this, and have run the entire build script again to make sure that nothing breaks because of this. I've also added some extra tests.
Nice work! Besides the minor comment, this looks good to me. What do you think @udesou? |
I think that it looks pretty good.
I can't remember exactly was the use case, but if this seems relevant, maybe check with him what he wanted to achieve there, and if your changes do not break such behaviour. |
Yes, the problem is that currently the resulting parse table class (that implements the
I was not aware of this case. But if it is a problem, I would suspect the problem to be in normalization.
Great! |
I've asked @hendrikvanantwerpen, and he says he does not remember this issue 🙂 |
See metaborg/sdf@291e625f (PR metaborg/sdf#26). Also implemented the placeholders using the new character class representation. Because of this, the `SymbolUtils` class (and its tests) could be removed.
Related PRs: metaborg/jsglr#48 and metaborg/spg#1
The main rationale behind this refactoring is that the class
CharacterClassSymbol
(a subclass ofSymbol
) was being "abused".It was used for instance in the lookahead restriction of symbols and reduce actions and in
State.doReduces
/State.addReduceAction
.Those use sites do not need to know anything related to a grammar symbol, but really only need to know which characters are in the CC.
Therefore, it is better to directly use the
ICharacterClass
interface at these places. In order to accomplish this, the following changes have been made:union
,intersection
, anddifference
used to be called on theCharacterClassFactory
, andisEmpty
could only be called onCharacterClassSymbol
. Now, they can be called directly onICharacterClass
.toAterm
used to be a method in theCharacterClassSymbol
. It is now also a method inICharacterClass
, where a more specific implementation for the "single" and "range" classes (resolving theFIXME
comment).tableinterfaces
toorg.metaborg.terms
.Note that the
CharacterClass
symbol class has been renamed toCharacterClassSymbol
, to reduce the confusion between the grammar symbol and theICharacterClass
data structure.As a final remark: my autoformatter has sometimes reformatted spacing and import groups, even in classes where I only changed one line. Unfortunately, this does generate some noise in the diff.