Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CoxnetSurvivalAnalysisCV to search for optimal alpha via cross-validation #14

Open
plpxsk opened this issue Nov 8, 2017 · 8 comments

Comments

@plpxsk
Copy link
Contributor

plpxsk commented Nov 8, 2017

Curious to know if it is possible to extract the sparse features in a Cox model from L1 models.

when applying this:

pipe = make_pipeline(
    StandardScaler(),
    CoxnetSurvivalAnalysis(l1_ratio=1)
)

sfm = SelectFromModel(pipe)
sfm.fit(X_train, y_train)

n_features = sfm.transform(X).shape[1]

... I get:

ValueError: The underlying estimator Pipeline has no `coef_` or `feature_importances_` attribute.
Either pass a fitted estimator to SelectFromModel or call fit before calling transform.

source

Thanks for all the input!

[edited]

@plpxsk
Copy link
Contributor Author

plpxsk commented Nov 8, 2017

It appears that there is a coef_ method, which can be extracted as:

pipe.get_params()['coxnetsurvivalanalysis'].coef_

But this still gives error:

sfm = SelectFromModel(pipe.get_params()['coxnetsurvivalanalysis'])
sfm.fit(X_train, y_train)

n_features = sfm.transform(X_train).shape[1]
ValueError: X has a different shape than during fitting.

@sebp
Copy link
Owner

sebp commented Nov 8, 2017

The difference is that the coef_ attribute contains the coefficients for multiple alpha values (each column corresponds to one alpha). SelectFromModel excepts a single set of coefficients, thus the exception. LassoCV is built in on top of Lasso and selects a single alpha via cross-validation. This is currently not implemented for CoxnetSurvivalAnalysis, but would be nice addition.

@sebp sebp changed the title Possible to extract features from Lasso/Elastic Net? Determine optimal alpha via cross-validation in Lasso/Elastic Net Nov 8, 2017
@plpxsk
Copy link
Contributor Author

plpxsk commented Dec 5, 2017

thanks @sebp

But then which alpha is used for the score presented from fit.score(X, y), if I keep default parameters (which sets 100 alphas)?

@Williamongh
Copy link

Williamongh commented Dec 21, 2017

Thanks for your brilliant work. @sebp
And I have similar problem.
Could you please give an example for CoxnetSurvivalAnalysis ?
Thx.

BTW, have you had any idea about this problem? Thank you. @pavopax

@sebp
Copy link
Owner

sebp commented Jan 8, 2018

The score method just calls predict without explicitly specifying an alpha value, in which case the last column of coef_ is used (corresponding to the smallest alpha in the path). If you want to compute the concordance index for multiple alpha values, you'd need to call predict and pass a value to the alpha keyword argument: model.predict(X_test, alpha=model.alphas_[1]). Of course, you can also specify alpha values not in model.alphas_ in which case coefficients will be linearly interpolated between adjacent alphas.

@CharlieCheckpt
Copy link

Hi @sebp, thank you for this great package.
I would also love to see this feature implemented; that would be very nice to have an equivalent of cv.glmnet(glmnet R package) for scikit-survival.
IMO, the main difficulty is too implement this in an efficient way, because doing cross-validation on each alpha is very costly.

@hermidalc
Copy link
Contributor

@CharlieCheckpt I’m sure you’ve seen this https://gist.github.com/sebp/d580d44c4beab3379c6dfda6810b33b8

While it does GridSearchCV over each alpha in my experience since @sebp wrote CoxnetSurvivalAnalysis in C++ with Eigen it’s really fast so it doesn’t matter.

@hermidalc
Copy link
Contributor

hermidalc commented Mar 19, 2020

The sklearn way to determine the optimal alpha without GridSearchCV would be to design a CoxnetSurvivalAnalysisCV class very much like sklearn.linear_model.ElasticNetCV

The only disadvantage to the *CV classes in sklearn is that if you have a Pipeline composite estimator and need to do joint hyperparameter search across different steps in the pipeline then of course this must be done with GridSearchCV and you cannot use *CV estimator classes and have to instead use the non-CV versions.

@sebp sebp changed the title Determine optimal alpha via cross-validation in Lasso/Elastic Net Add CoxnetSurvivalAnalysisCV to search for optimal alpha via cross-validation Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants