Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predefined "blind" well data as measure of performance leads to over-fitting. #2

Open
LukasMosser opened this issue Oct 18, 2016 · 12 comments
Labels

Comments

@LukasMosser
Copy link
Contributor

The contest outlines that the same well as in the publication will be used to judge the performance of the
proposed solutions. This can lead to overfitting by using the prediction capability for
the proposed well as a loss function.

Should another performance measure be used to compensate for overfitting?

@kwinkunks
Copy link
Member

Thank you for raising this, Lukas.

You mean because, even without using the well explicitly in the model,
parameters, features, etc, will be chosen to fit that well?

Any ideas for other measures, given that there are no more wells? Perhaps a
combination of 2 or 3 measures?

I guess part of the problem is that any public measure could be trained
towards. Is the only objective test a secret one?

On Tue, Oct 18, 2016, 18:21 Lukas Mosser [email protected] wrote:

The contest outlines that the same well as in the publication will be used
to judge the performance of the
proposed solutions. This can lead to overfitting by using the prediction
capability for
the proposed well as a loss function.

Should another performance measure be used to compensate for overfitting?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABnS1LycEvaEtjSYwM4l8Ja3hPklKTEiks5q1ThMgaJpZM4KaTe3
.

@LukasMosser
Copy link
Contributor Author

One Idea could be to have a secret test well.

Another could be to perform the prediction on all but one well, retaining the others
for training purposes and then averaging the scores over the ensemble of training and testing runs.

E.g.:
Available Wells: A, B, C
First Run: Train on: A,B => Test on: C => Score 1
Second Run: Train on: B,C => Test on: A => Score 2
Third Run: Train on: A,C => Test on: B => Score 3

Final Score = Arithmetic Average of Score 1, Score 2, Score 3

I'm guessing, by no means an expert on this, but this would lead to a better estimate of an appropriate method rather than one set of parameters that are specifically designed to perform well on the proposed blind test well X.

@lperozzi
Copy link
Contributor

Actually I think Lukas is right. This is called k-fold cross-validation
and it is well explained by this figure.

https://chrisjmccormick.files.wordpress.com/2013/07/10_fold_cv.png

When compute the accuracy of the blind well alone, score variation is
pretty high +/- 5%. K-fold cross-validation can help to stabilize the
score.
Moreover, I think that precision/recall/f1 score are more appropriate
for this kind of problem.

Lukas Mosser mailto:[email protected]
October 19, 2016 at 3:29 AM

One Idea could be to have a secret test well.

Another could be to perform the prediction on all but one well,
retaining the others
for training purposes and then averaging the scores over the ensemble
of training and testing runs.

E.g.:
Available Wells: A, B, C
First Run: Train on: A,B => Test on: C => Score 1
Second Run: Train on: B,C => Test on: A => Score 2
Third Run: Train on: A,C => Test on: B => Score 3

Final Score = Arithmetic Average of Score 1, Score 2, Score 3

I'm guessing, by no means an expert on this, but this would lead to a
better estimate of an appropriate method rather than one set of
parameters that are specifically designed to perform well on the
proposed blind test well X.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF6UYWADLqiDH854N7l2IvWGtUEELLM3ks5q1cbYgaJpZM4KaTe3.

Lukas Mosser mailto:[email protected]
October 18, 2016 at 5:21 PM

The contest outlines that the same well as in the publication will be
used to judge the performance of the
proposed solutions. This can lead to overfitting by using the
prediction capability for
the proposed well as a loss function.

Should another performance measure be used to compensate for overfitting?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2, or mute the thread
https://github.com/notifications/unsubscribe-auth/AF6UYRYdz09-BhAt5XTWKfrRGiCfh5kVks5q1ThMgaJpZM4KaTe3.

Cordialement,
Cheers,

Lorenzo Perozzi | PhD Geophysics
[email protected] mailto:[email protected]

linkedin https://www.linkedin.com/in/lperozzi github
https://github.com/lperozzi twitter https://twitter.com/lor3nzop3rozzi

liamg logo http://www.liamg.ca

@LukasMosser
Copy link
Contributor Author

Excellent Visualization!

I agree, I've seen even higher score variation it seems when running any of my code.

@kwinkunks
Copy link
Member

Thank you again @LukasMosser for raising this issue of meta-overfitting, and @lperozzi for chiming in.

We can (and expect to) learn as we go, so we'll look at k-fold CV as a score. That seems like a sensible approach.

I have also got another plan — some fully blind data that we will not release. I'll post back to this issue if and when that's a reality.

I guess all this is just something we have to live with in what is necessarily a limited universe of data. And I'm reading that it is (therefore) a common problem in machine learning contests. It makes me glad that there isn't $1 million at stake in this contest. Because we nearly went with that!

(no, we didn't)

@LukasMosser
Copy link
Contributor Author

LukasMosser commented Oct 26, 2016

@kwinkunks @lperozzi
It's a bit rough code I'd say but here's my take at k=1 fold training.

Gist

@kwinkunks
Copy link
Member

Hi everyone... A quick update on this issue of meta-overfitting.

We have some labels for the STUART and CRAWFORD wells. These labels were not part of the data package, so they are truly blind. We will therefore be testing entries against them.

We may also keep some type of k-fold CV in the mix. Brendon and I aim to propose a scoring strategy before next week. The goal is to be both analytically rigorous and fair to everyone.

Either way, this means that all of the wells with labels in the data package can be used in training.

@mycarta
Copy link
Contributor

mycarta commented Nov 12, 2016

Hi Matt, Brendon

I am just starting today on the contest. I am trying to catch up on this issue, so to be sure I understand: we are to use all the wells for training, and in our submission we only include training (crossvalidated) scores? There's no validation on a blind well. I imagine entries can still include visual result (e.g. plot the predicted facies) on the STUART and CRAWFORD wells, but it is you and Brendon that will be doing the validation against STUART and CRAWFORD separately. Correct?
This also means that Bryan / CannedGeo should have an opportunity to re-submit after trowing SHANKLE in the training.

@kwinkunks
Copy link
Member

Hey @mycarta ... Yes, use all the wells in training. And yes, we will validate against STUART and CRAWFORD, and it's that score that will count. I'm not sure yet if the k-fold CV score will count for anything, but I think it's probably the most useful thing to aim at.

And yes, @CannedGeo should probably retrain his model, although I can do that too with his existing hyperparameters... it just might not be optimal. I suspect he's working on it in any case.

@CannedGeo
Copy link
Contributor

@mycarta @kwinkunks If by retraining my model you mean banging head against wall, then yes, that is exactly what I've been doing!! This is maybe better presented in a separate question, but would it be totally against the rules to try and generate a PE for Alexander D and Kimzey A? I mean this is a machine learning exercise is it not?... And then include all wells in the model?

@kwinkunks
Copy link
Member

@CannedGeo Yes, please do put that in an Issue of its own so other people will be more likely to see it. Short answer: you can do anything you like with the features, including generating new ones. Definitely.

@kwinkunks
Copy link
Member

On the k-fold CV issue...

I just made a small demo of stepwise dropout. Please have a look and see if you agree that it does what we've been talking about.

cc @lperozzi @LukasMosser (I guess you will get notifications anyway, just making sure :)

da-wad pushed a commit to da-wad/2016-ml-contest that referenced this issue Jan 29, 2017
kwinkunks added a commit that referenced this issue Jan 30, 2017
kwinkunks pushed a commit that referenced this issue Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants