Skip to content

StatFSMPaperSite

cteichmann edited this page Jul 11, 2016 · 15 revisions

This page explains how we obtained the data presented in our StatFSM paper and how to replicate our experiments.

Random Automata

The random automata we used in our evaluations can be found here. Each folder in the archive has the form 'x_y' where 'x' corresponds to the 'l' parameter mentioned in the paper and '0.y' corresponds to the gamma parameter. There are five automata in every folder, written in the normal Alto format for tree automata. We computed the evaluation stats on each of them, but the general trends in the data were the same.

The shell script and configuration files for generating the automata can be found here. The configuration files have the following fields:

folder - where to put the random automata once they have been generated

fileNamePrefix - what to name the files (for the final file the number of the generated automaton, plus the file ending .auto will be added, files are always just overwritten)

size - the l parameter from the paper

toGenerate - how many automata the program should generate

seed - the random number seed that should be used, results should be the same whenever the program is run, as long as the random number seed is not changed, but there might be variations in the way that the random number generator works depending on the plattform.

alpha - the gamma parameter from the paper

The original automata were build with the jar with dependencies from this version of the alto code. The jar is put in the same folder as the script and the config files, then the script is executed. The java main for generating random automata is in de.up.ling.irtg.script.CreateRandomAutomata, but that code really only reads the config file and then calls the methods in de.up.ling.irtg.random_automata.CreateRandomAutomata which are documented to some extent in the code.

Convergence Experiments

The data for the convergence analysis in the paper can be found here. The folder contains Veusz files used to generate the plots from the paper. The Veusz files plot the data for the first automaton (the one with the subscript _0) except for the experiments with l = 30, there it plots the second automaton (the one with the subscript _1). This must have been an error I made when importing the data. I re-checked the plots for the _0 files and at least I cannot see a substantial difference.

The folders of measurements start their name with x_y where x is the l parameter and 0.y is the gamma parameter from the paper. If there is no additional _z number after the y, then the data was generated with 500 samples per round. Otherwise we used 2000 samples.

The measurements are in a files of semicolon-separated values. Each column contains all the measurements made for a single round of adaption over all repetitions of the experiment and each row contains all the measurements made for one repetition over all its adaption rounds.

The shell script and configuration files for generating the measurements data can be found

Clone this wiki locally