-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make concordance_index_censored more robust to failure with datasets that have few uncensored samples #117
Comments
In this example, the only event has the maximum observed time, thus there are no comparable pairs and the concordance index is unspecified. I believe throwing an error is the right thing to do in this case, but ZeroDivisionError might not be the best one. If you are doing CV, you can use stratified CV on the event indicator to ensure each split has enough uncensored event times. |
Hi - I am doing stratified CV on the event indicator, it's still not enough in many real-world cases that I've worked with (i.e. TCGA). The issue with There is no workaround possible from the user side, the only workaround it to change the scikit-learn |
I've opened an issue with scikit-learn, we'll see what they say. |
Once I changed the error to something more reasonable than
ZeroDivisionError, you can just catch it return zero. At least sklearn's
GridSearchCV can do something similar when setting the error_score
parameter.
|
I've submitted a feature request to scikit-learn to move the SearchCV scoring code into the try/except that the fit is in, so that it does the same thing as when a fit fails where it raises a warning and sets the score for the train/test split parameter combo to zero (or whatever |
Certain datasets will naturally have few uncensored samples. I will often have
concordance_index_censored
throwing an errorZeroDivisionError: float division by zero
which stops model selection, even when I use a custom CV iterator that will stratify folds such that each fold will have at least one uncensored sample.This makes things difficult on the user code side when using
GridSearchCV
for example, since it throws an error during CV scoring (not fitting) and this exception isn't caught byGridSearchCV
, so everything stops. A possible solution would be simply to give back a c-index score of 0 and give a warning.Here an example with y:
and self.predict(X_test):
The text was updated successfully, but these errors were encountered: