Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Participant %13: Team madPL, University of Wisconsin--Madison & Microsoft Research #29

Open
chenzimin opened this issue Oct 3, 2018 · 4 comments
Labels
participant Participant of the CodRep-competition

Comments

@chenzimin
Copy link
Collaborator

chenzimin commented Oct 3, 2018

Created for Team madPL from University of Wisconsin--Madison & Microsoft Research for discussions. Welcome!

Jordan Henkel, Shuvendu Lahiri, Ben Liblit, Thomas Reps

@chenzimin chenzimin added the participant Participant of the CodRep-competition label Oct 3, 2018
@jjhenkel
Copy link

jjhenkel commented Oct 4, 2018

We have a technique based on treating the repair problem as a search/ranking problem. We extract features and then run a "learning to rank" technique on the data. As a post-processing step, we rule out the highest ranked prediction if applying the repair at that location yields a file that fails to parse (and, if the file was parseable originally, with no repair).

Here's a table that summarizes our results:

Trained On Loss on Dataset1 Loss on Dataset2 Loss on Dataset3 Loss on Dataset4 Parseability Check
80% of 2 0.087606 0.068825 0.05736 0.06536 NO
80% of 2 0.085909 0.067685 0.05537 0.06484 YES
80% of 123 0.069487 0.066061 0.04301 0.07607 NO
80% of 124 0.056232 0.058874 0.05606 0.03400 NO
80% of 134 0.052917 0.085307 0.03244 0.03716 NO
80% of 234 0.096918 0.065058 0.03698 0.03990 NO
80% of 1234 0.044905 0.051056 0.02839 0.03525 NO
80% of 1234 0.044459 0.050524 0.02831 0.03515 YES

The first two rows show our best performance training on 80% of a single dataset (Dataset2). The next four rows show performance when doing cross-validation (by holding out one whole dataset each time). The last two rows show performance of a model trained on all datasets, with and without the parseability filter.

One difficulty with this technique is that its performance on totally unseen data is unpredictable. It usually generalizes well enough, but I'm sure with more time to tune and better features you could have a model that generalizes better.

We've made our submission available via docker hub (it will use the model trained on all datasets). To run this on a new dataset do the following (on a machine with docker installed):

docker pull jjhenkel/instauro
docker run -it --rm -v /path/to/Datasets/NewDataset:/data jjhenkel/instauro

@tdurieux
Copy link
Contributor

tdurieux commented Oct 4, 2018

It is a really interesting result.

It is funny to see that by learning from 2 3 4 you obtain a worse result on dataset 1 than just with dataset 2.

By any chance, do you have the effectiveness of your approach on the tasks that have not been used during the training (the 20%)?

During the learning, did you take into account that some tasks are duplicated?

@jjhenkel
Copy link

jjhenkel commented Oct 4, 2018

Hi @tdurieux

I didn't save performance measurements for the 20% used for validation. I did watch some models complete training, and each time performance on the 20% was within a percent or two of performance on the 80% (it was learning to rank using Precision @ 1 as its metric).

The learner is not taking into account duplicate tasks (as in I do not filter duplicates anywhere). Although, I do think it may be interesting to train on 100% of 3 of the datasets and use the held-out dataset as a validation set. Using this strategy the learner would stop when it didn't make any progress on the held-out set; that may help to prevent overfitting.

@monperrus
Copy link
Collaborator

Indeed interesting ... and quite good! Looking forward to the performance on the hidden dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
participant Participant of the CodRep-competition
Development

No branches or pull requests

4 participants