Skip to content

Computing scores

weissenh edited this page Sep 14, 2021 · 7 revisions

Sometimes, it's useful to have am-parser only compute edge existence, edge label and supertag scores. In particular, those scores can be used for the A* parser.

Computing scores

Assuming you have a trained model in models/a_model, you can get scores as follows:

python dump_scores.py models/a_model <formalism> <input data.amconll> <output file.zip> --cuda-device 0

which will create an output zip file with those scores (computing happens on GPU 0). The scores are log probabilities for edge existence, edge labels and supertags.

Score format

A zip file with scores has the following contents:

  • opProbs.txt with edge existence and edge label scores (op short for operation, since the edges represent AM algebra operations).
  • tagProbs.txt with supertag scores
  • corpus.amconll with the input sentences, the best unlabeled dependency tree according to the model, the best lexical labels and measurements of computation time (computation time of batch divided by batch size).

In opProbs.txt,

  • each line corresponds to a sentence
  • the scores for different edges are tab-separated
  • for one edge, first there are the indices of the edge in the format [i,j] for an edge from i to j. The indices are 1-based with respect to the sentence, but there is a 0 used as an "artificial root", where ROOT and IGNORE edges attach. The score for the edge existence is given after a '|'; all scores are in log-space. After the edge existence score, the most likely labels and their scores are given, all separated by one whitespace. Example: the score [1,7]|-3.0764 MOD_mod|-0.7181 MOD_s|-0.9475 means that the log probability of an edge existing from 1 to 7 is -3.07 and the log probability that this edge has the label MOD_mod is -0.7181, the log probability that this edge has the label MOD_s is -0.9475 and so on.

In tagProbs.txt,

  • each line corresponds to a sentence
  • for each token, there is a block of supertag scores. Blocks are separated by tabs.

Each block contains multiple scores in a format like this: NULL|-0.0168 (d<root>__ALTO_WS__/__ALTO_WS__--LEX--)--TYPE--()|-4.69096, which says that the log probability for this token to have no contribution (NULL graph) is -0.0168 and that the log probability that this token has the supertag (d<root> / --LEX--)--TYPE--() is -4.691. Note that a space is used here to separate between the different supertags a token might have and thus, spaces in the graph constants are represented as __ALTO_WS__.

Using the scores zip file for error analysis
If you're conducting an error analysis and would like to look at the scores, you might find the format described above a bit hard to read. @weissenh wrote a script (not on the master branch so far!) to make the files in scores.zip more human-readable, you can find it on the cogs_unsupervised branch as the file analyzers/scores_prettify.py. Run the script with --help to display usage information. It can also display fancy looking heatmaps of edge existence scores (see the documentation in the script, especially the function which_sentence2plot and the --maxshow option).