Refactor CharacterClass: use ICharacterClass where possible #26

mpsijm · 2019-07-05T10:58:16Z

Related PRs: metaborg/jsglr#48 and metaborg/spg#1

The main rationale behind this refactoring is that the class CharacterClassSymbol (a subclass of Symbol) was being "abused".
It was used for instance in the lookahead restriction of symbols and reduce actions and in State.doReduces/State.addReduceAction.
Those use sites do not need to know anything related to a grammar symbol, but really only need to know which characters are in the CC.
Therefore, it is better to directly use the ICharacterClass interface at these places. In order to accomplish this, the following changes have been made:

union, intersection, and difference used to be called on the CharacterClassFactory, and isEmpty could only be called on CharacterClassSymbol. Now, they can be called directly on ICharacterClass.
toAterm used to be a method in the CharacterClassSymbol. It is now also a method in ICharacterClass, where a more specific implementation for the "single" and "range" classes (resolving the FIXME comment).
- This adds a dependency from tableinterfaces to org.metaborg.terms.

Note that the CharacterClass symbol class has been renamed to CharacterClassSymbol, to reduce the confusion between the grammar symbol and the ICharacterClass data structure.

As a final remark: my autoformatter has sometimes reformatted spacing and import groups, even in classes where I only changed one line. Unfortunately, this does generate some noise in the diff.

The main rationale behind this refactoring is that the class `CharacterClassSymbol` (a subclass of `Symbol`) was being "abused". It was used for instance in the lookahead restriction of symbols and reduce actions and in `State.doReduces`/`State.addReduceAction`. Those use sites do not need to know anything related to a grammar symbol, but really only need to know which characters are in the CC. Therefore, it is better to directly use the `ICharacterClass` interface at these places. In order to accomplish this, the following changes have been made: - `union`, `intersection`, and `difference` used to be called on the `CharacterClassFactory`, and `isEmpty` could only be called on `CharacterClassSymbol`. Now, they can be called directly on `ICharacterClass`. - `toAterm` used to be a method in the `CharacterClassSymbol`. It is now also a method in `ICharacterClass`, where a more specific implementation for the "single" and "range" classes (resolving the `FIXME` comment). - This adds a dependency from `tableinterfaces` to `org.metaborg.terms`. Note that the `CharacterClass` symbol class has been renamed to `CharacterClassSymbol`, to reduce the confusion between the grammar symbol and the `ICharacterClass` data structure. As a final remark: my autoformatter has sometimes reformatted spacing and import groups, even in classes where I only changed one line. Unfortunately, this does generate some noise in the diff.

jasperdenkers · 2019-07-05T14:59:06Z

...rg.characterclasses/src/main/java/org/metaborg/characterclasses/CharacterClassOptimized.java

@@ -15,6 +17,8 @@
    private boolean containsEOF; // [256]

    private int min, max;
+    // This field is derived from the fields wordX and containsEOF, and is therefore not used in hashCode and equals
+    private boolean empty;


The CharacterClassOptimized() constructor rejects empty character classes, which I think makes sense. We don't need the field and isEmpty can just return false, right?

The CharacterClassOptimized() constructor is useless, now that I look at it, since it always throws an exception 😛 I can remove that one.

However, in the constructor CharacterClassOptimized(long word0, long word1, long word2, long word3, boolean containsEOF, int min, int max), no exceptions are thrown if the character class is empty, so in that case the field is still necessary. Or do you think we should actually throw an exception if it is empty?

Yes. I think it makes sense to throw an exception in case an empty optimized character is instantiated. At the point of finalizing character classes, parser generation is done and non-empty character classes are useless. I'm not sure if it could happen at all, but actions for empty character classes could be left out the parse table.

I've fixed this, and have run the entire build script again to make sure that nothing breaks because of this. I've also added some extra tests.

jasperdenkers · 2019-07-05T15:07:52Z

Nice work! Besides the minor comment, this looks good to me.

What do you think @udesou?

udesou · 2019-07-08T02:10:10Z

I think that it looks pretty good.
There is indeed a problem with classes that are used for parse table generation, and classes used for parsing. I think @jasperdenkers and I discussed something like that, where the parse table object was filled with noise from parse table generation.
One minor thing that I don't even know if is relevant or not; I think Hendrik found once that the JSGLR1 behaviour for the two productions below A.C1 and A.C2 is different:

A.C1 = 
A.C2 = []

I can't remember exactly was the use case, but if this seems relevant, maybe check with him what he wanted to achieve there, and if your changes do not break such behaviour.
Finally, I'm also changing a few things in the parse table generator to be able to remove some dependencies (from parsers to sdf2table, e.g.). The end goal is to improve the performance of the generator, but hopefully, I can also do some of this decoupling of classes and interfaces in the process.
Again, thanks @mpsijm!

jasperdenkers · 2019-07-08T12:47:05Z

There is indeed a problem with classes that are used for parse table generation, and classes used for parsing. I think @jasperdenkers and I discussed something like that, where the parse table object was filled with noise from parse table generation.

Yes, the problem is that currently the resulting parse table class (that implements the IParseTable interface and that is an input to the parser) contains data structures from parser generation that are not used anymore after generation. These data structures are expensive to serialize and this slows down e.g. writing/reading parse tables to/from cache in the build system. Ideally, the algorithm to generate the parse table and the data structure to represent the generated parse table are separated.

One minor thing that I don't even know if is relevant or not; I think Hendrik found once that the JSGLR1 behaviour for the two productions below A.C1 and A.C2 is different:
A.C1 =
A.C2 = []
I can't remember exactly was the use case, but if this seems relevant, maybe check with him what he wanted to achieve there, and if your changes do not break such behaviour.

I was not aware of this case. But if it is a problem, I would suspect the problem to be in normalization.

Finally, I'm also changing a few things in the parse table generator to be able to remove some dependencies (from parsers to sdf2table, e.g.). The end goal is to improve the performance of the generator, but hopefully, I can also do some of this decoupling of classes and interfaces in the process.

Great!

mpsijm · 2019-07-08T14:08:10Z

One minor thing that I don't even know if is relevant or not; I think Hendrik found once that the JSGLR1 behaviour for the two productions below A.C1 and A.C2 is different:
A.C1 = 
A.C2 = []
I can't remember exactly was the use case, but if this seems relevant, maybe check with him what he wanted to achieve there, and if your changes do not break such behaviour.

I've asked @hendrikvanantwerpen, and he says he does not remember this issue 🙂

See metaborg/sdf@291e625f (PR metaborg/sdf#26).

See metaborg/sdf@291e625f (PR metaborg/sdf#26). Also implemented the placeholders using the new character class representation. Because of this, the `SymbolUtils` class (and its tests) could be removed.

This was referenced Jul 5, 2019

Refactor CharacterClass: use ICharacterClass where possible metaborg/jsglr#48

Merged

Refactor CharacterClass: use ICharacterClass where possible metaborg/spg#1

Merged

jasperdenkers reviewed Jul 5, 2019

View reviewed changes

jasperdenkers requested a review from udesou July 5, 2019 15:08

mpsijm added 2 commits July 5, 2019 21:53

Throw exception if optimized character class is empty

285d8b6

Fix two Range bugs (see tests) and consistently use canonical

0e47571

mpsijm mentioned this pull request Jul 8, 2019

WIP: SLR parse table generation #27

Draft

jasperdenkers merged commit 0c7fe40 into metaborg:develop/jsglr2 Jul 8, 2019

jasperdenkers pushed a commit to metaborg/jsglr that referenced this pull request Jul 8, 2019

Refactor CharacterClass: use ICharacterClass where possible

6aade5b

See metaborg/sdf@291e625f (PR metaborg/sdf#26).

mpsijm deleted the refactor-characterclass branch July 8, 2019 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor CharacterClass: use ICharacterClass where possible #26

Refactor CharacterClass: use ICharacterClass where possible #26

mpsijm commented Jul 5, 2019 •

edited

Loading

jasperdenkers Jul 5, 2019

mpsijm Jul 5, 2019 •

edited

Loading

jasperdenkers Jul 5, 2019

mpsijm Jul 5, 2019

jasperdenkers commented Jul 5, 2019

udesou commented Jul 8, 2019 •

edited

Loading

jasperdenkers commented Jul 8, 2019

mpsijm commented Jul 8, 2019

Refactor CharacterClass: use ICharacterClass where possible #26

Refactor CharacterClass: use ICharacterClass where possible #26

Conversation

mpsijm commented Jul 5, 2019 • edited Loading

jasperdenkers Jul 5, 2019

Choose a reason for hiding this comment

mpsijm Jul 5, 2019 • edited Loading

Choose a reason for hiding this comment

jasperdenkers Jul 5, 2019

Choose a reason for hiding this comment

mpsijm Jul 5, 2019

Choose a reason for hiding this comment

jasperdenkers commented Jul 5, 2019

udesou commented Jul 8, 2019 • edited Loading

jasperdenkers commented Jul 8, 2019

mpsijm commented Jul 8, 2019

mpsijm commented Jul 5, 2019 •

edited

Loading

mpsijm Jul 5, 2019 •

edited

Loading

udesou commented Jul 8, 2019 •

edited

Loading