Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

chenzimin · 2018-10-03T15:51:06Z

Created for Team madPL from University of Wisconsin--Madison & Microsoft Research for discussions. Welcome!

Jordan Henkel, Shuvendu Lahiri, Ben Liblit, Thomas Reps

jjhenkel · 2018-10-04T13:44:19Z

We have a technique based on treating the repair problem as a search/ranking problem. We extract features and then run a "learning to rank" technique on the data. As a post-processing step, we rule out the highest ranked prediction if applying the repair at that location yields a file that fails to parse (and, if the file was parseable originally, with no repair).

Here's a table that summarizes our results:

Trained On	Loss on Dataset1	Loss on Dataset2	Loss on Dataset3	Loss on Dataset4	Parseability Check
80% of 2	0.087606	0.068825	0.05736	0.06536	NO
80% of 2	0.085909	0.067685	0.05537	0.06484	YES
80% of 123	0.069487	0.066061	0.04301	0.07607	NO
80% of 124	0.056232	0.058874	0.05606	0.03400	NO
80% of 134	0.052917	0.085307	0.03244	0.03716	NO
80% of 234	0.096918	0.065058	0.03698	0.03990	NO
80% of 1234	0.044905	0.051056	0.02839	0.03525	NO
80% of 1234	0.044459	0.050524	0.02831	0.03515	YES

The first two rows show our best performance training on 80% of a single dataset (Dataset2). The next four rows show performance when doing cross-validation (by holding out one whole dataset each time). The last two rows show performance of a model trained on all datasets, with and without the parseability filter.

One difficulty with this technique is that its performance on totally unseen data is unpredictable. It usually generalizes well enough, but I'm sure with more time to tune and better features you could have a model that generalizes better.

We've made our submission available via docker hub (it will use the model trained on all datasets). To run this on a new dataset do the following (on a machine with docker installed):

docker pull jjhenkel/instauro
docker run -it --rm -v /path/to/Datasets/NewDataset:/data jjhenkel/instauro

tdurieux · 2018-10-04T17:13:49Z

It is a really interesting result.

It is funny to see that by learning from 2 3 4 you obtain a worse result on dataset 1 than just with dataset 2.

By any chance, do you have the effectiveness of your approach on the tasks that have not been used during the training (the 20%)?

During the learning, did you take into account that some tasks are duplicated?

jjhenkel · 2018-10-04T18:02:47Z

Hi @tdurieux

I didn't save performance measurements for the 20% used for validation. I did watch some models complete training, and each time performance on the 20% was within a percent or two of performance on the 80% (it was learning to rank using Precision @ 1 as its metric).

The learner is not taking into account duplicate tasks (as in I do not filter duplicates anywhere). Although, I do think it may be interesting to train on 100% of 3 of the datasets and use the held-out dataset as a validation set. Using this strategy the learner would stop when it didn't make any progress on the held-out set; that may help to prevent overfitting.

monperrus · 2018-10-05T00:51:31Z

Indeed interesting ... and quite good! Looking forward to the performance on the hidden dataset.

chenzimin added the participant Participant of the CodRep-competition label Oct 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

chenzimin commented Oct 3, 2018 •

edited by monperrus

Loading

jjhenkel commented Oct 4, 2018

tdurieux commented Oct 4, 2018

jjhenkel commented Oct 4, 2018

monperrus commented Oct 5, 2018

Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

Comments

chenzimin commented Oct 3, 2018 • edited by monperrus Loading

jjhenkel commented Oct 4, 2018

tdurieux commented Oct 4, 2018

jjhenkel commented Oct 4, 2018

monperrus commented Oct 5, 2018

chenzimin commented Oct 3, 2018 •

edited by monperrus

Loading