Incorrect derivations on "larger" grammars #62

akoehn · 2020-05-27T08:51:35Z

I have a curious case where I grew my grammar with 46 rules to 50 and derivations only needing the initial 46 rules suddenly changed (i.e., the derivation produced by viterbi is correct but not optimal)

The derivations I am looking at (both the correct one and the incorrect one) do not use any of the four new rules.

Commenting out any(!) two unrelated rules (two of the new ones or two of the old ones) fixes the derivation. Therefore, this seems to be an alto bug to me.

This is just an initial issue description, will update later with more info once I had some time to look into this myself.

alexanderkoller · 2020-05-27T08:54:41Z

Does the grammar have two rules with the same terminal symbol?

akoehn · 2020-05-27T08:56:55Z

No. And as I said, I can comment out two of the new rules and everything works or I keep all four of the new rules and comment out two of the old rules (which worked fine before) and everything works -- except derivations with the two commented-out rules of course.

akoehn · 2020-05-27T08:57:46Z

And we should really have a check for the same terminal symbol case while reading an irtg -- this is a bad trap people can fall into.

alexanderkoller · 2020-08-19T07:13:41Z

Is this problem still there? Could you post the grammar so I can reproduce the behavior?

akoehn · 2020-08-21T12:07:58Z

Yes, steps to reproduce:

git clone [email protected]:minecraft-saar/minecraft-nlg.git
cd minecraft-nlg
./gradlew test
# observe that all tests are successful
# now re-enable three rules I commented out. For the lazy:
sed -i '190,190 d; 200,200 d' src/main/resources/de/saar/coli/minecraft/minecraft.irtg
./gradlew test
# observe that the last test fails even though the new rules have nothing to do with the generated text

It seems that merely adding two rules that are not even part of the now resulting tree prohibits the correct (shortest) tree to be found:
AssertionFailedError: expected: <build a row in front of the previous row> but was: <build a row to the right of length four in front of the previous row>
Notice that the re-enabled rules are for describing height and neither the expected derivation nor the now incorrectly built derivation use rules abouth height.

Edit in case this cannot be reproduced later on: the minecraft-nlg commit I tested this with is 4da39a42cdfbf84f1c835e074fb0d99facbf67ec

akoehn · 2020-09-03T12:49:58Z

I checked the derivation and the original tree is still derivable with the extended irtg, it still has the correct interpretations and it has the better score. So it seems like something weird is going on in the viterbi decoding.

akoehn · 2020-09-04T09:09:13Z

This is the relevant snippet from my code:

var target = this.getTreeForInstruction(List.of("a", "row",  "in front of",  "the", "previous", "row"));
// ta is the tree automaton I use.  All rules have a weight of 0.9
bestTree = ta.viterbi();
// bestTree should be the tree with the highest possible score, but ...
System.out.print("accepts target: ");
System.out.println(ta.accepts(target)); // true
System.out.print("viterbi result tree weight: ");
System.out.println(ta.getWeight(bestTree)); // 0.313
System.out.print("target tree weight: ");
System.out.println(ta.getWeight(target)); // 0.38742

i.e. viterbi does not find the correct tree. I spoke with @jgroschwitz and he asked me whether a concrete ta works. And indeed, adding ta = ta.asConcreteTreeAutomaton(); at the top fixes the problem. There seems to be a problem with lazy evaluation somewhere.

jgroschwitz · 2020-09-04T09:32:03Z

I think there's something funky with the way rules are stored in lazy automata. On the one hand, the RuleStore class doesn't always work as I expect it to (I have more a general feeling than a concrete example though, unfortunately). On the other hand, whenever one implements a new TreeAutomaton class, I think one has to make sure that getRulesBottomUp and getRulesTopDown store the rules properly; but I am not aware of this being documented or really what the best practice should be.

alexanderkoller · 2020-09-09T12:03:59Z

Hi, I can't reproduce this; the test succeeds for me even after removing the comment lines, with both the current Git revision and with 4da3.

Could you write a self-contained unit test for Alto that exhibits the problem? (The grammar can live in its own file, we'll put it in the resources directory.)

akoehn · 2020-09-14T15:37:40Z

@alexanderkoller test is now in a new branch: viterbi-derivation-test

alexanderkoller · 2020-09-14T15:45:11Z

Great, thanks. Now I can reproduce it.

alexanderkoller · 2020-09-26T14:40:06Z

Some preliminary observations:

One core problem is that the decomp automaton of SubsetAlgebra does not support top-down queries. This caused an error when InvHomAutomaton# called its makeAllRulesExplicit. Because it was apparently too easy to just ignore this error message, I changed it into an exception (on the branch).
The best tree of the concrete automaton is accepted by the original automaton, with the correct weight. Thus the problem is somewhere in the way viterbi accesses the rule store.
The issue uses a lot of badly maintained classes; not just InverseHomAutomaton (see Clean up InverseHomAutomaton #70), but also IntersectionAutomaton. If the original grammar could use nondeleting homomorphisms, everything would become neater and much faster.
The rule stores of both automata contain the same rules for bottom-up and top-down, so that can't really be the source of the discrepancy.
Calling makeAllRulesExplicit on the original automaton before calling viterbi does not help.

akoehn added a commit that referenced this issue Sep 14, 2020

add viterbi derivation test (see #62)

41619b2

alexanderkoller self-assigned this Sep 14, 2020

alexanderkoller added a commit that referenced this issue Sep 26, 2020

Debugging Incorrect derivations on "larger" grammars #62

6126183

akoehn pushed a commit that referenced this issue Dec 15, 2020

Debugging Incorrect derivations on "larger" grammars #62

7a1939a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect derivations on "larger" grammars #62

Incorrect derivations on "larger" grammars #62

akoehn commented May 27, 2020

alexanderkoller commented May 27, 2020

akoehn commented May 27, 2020

akoehn commented May 27, 2020

alexanderkoller commented Aug 19, 2020

akoehn commented Aug 21, 2020 •

edited

Loading

akoehn commented Sep 3, 2020

akoehn commented Sep 4, 2020

jgroschwitz commented Sep 4, 2020

alexanderkoller commented Sep 9, 2020 •

edited

Loading

akoehn commented Sep 14, 2020

alexanderkoller commented Sep 14, 2020

alexanderkoller commented Sep 26, 2020

Incorrect derivations on "larger" grammars #62

Incorrect derivations on "larger" grammars #62

Comments

akoehn commented May 27, 2020

alexanderkoller commented May 27, 2020

akoehn commented May 27, 2020

akoehn commented May 27, 2020

alexanderkoller commented Aug 19, 2020

akoehn commented Aug 21, 2020 • edited Loading

akoehn commented Sep 3, 2020

akoehn commented Sep 4, 2020

jgroschwitz commented Sep 4, 2020

alexanderkoller commented Sep 9, 2020 • edited Loading

akoehn commented Sep 14, 2020

alexanderkoller commented Sep 14, 2020

alexanderkoller commented Sep 26, 2020

akoehn commented Aug 21, 2020 •

edited

Loading

alexanderkoller commented Sep 9, 2020 •

edited

Loading