Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha Behavior #41

Open
dex314 opened this issue Aug 1, 2018 · 14 comments · May be fixed by #484
Open

Alpha Behavior #41

dex314 opened this issue Aug 1, 2018 · 14 comments · May be fixed by #484

Comments

@dex314
Copy link

dex314 commented Aug 1, 2018

When I run a model like this example:
mod = coxnet.CoxnetSurvivalAnalysis(n_alphas=30, l1_ratio=1.0)
There are times (and it could be data specific) where the length of the coefficients out of the model are less than the n_alphas I specified. For example, it often stops at 5 alphas deep.
The paths might have 15 variables > 0 (coming out) at 5 alphas deep, which is fine. The strange thing I am seeing is, lets say I set n_alphas=20 on the same data set. I end up getting more variables > 0 along the path (and still stopping at 5 alphas deep) or vice verse, if I set n_alphas = 40 on the same data set, I end up getting less variables > 0 along the path and once again the algorithm is automatically stopping at 5 alphas deep. (Im referring to the parameters as variables.)

Im assuming this is a bug as the way I have working with Elastic Nets in the past is that the alpha sequence should exponentially decrease toward some min but I should see more variables > 0 as I move forward and closer to the min in this alpha curve. Such that if I see 15 variables at alpha=30, then I should see <15 at alpha<30 and the reverse.

Could there be some ratio somewhere that is picking up a variable of similar name to a global that is confusing the alpha parameters in the Elastic net?

@plpxsk
Copy link
Contributor

plpxsk commented Aug 2, 2018

Can you clarify a bit the phrase "5 alphas deep"? Not exactly sure what this means. Thanks!

@dex314
Copy link
Author

dex314 commented Aug 2, 2018

By 5 alphas deep, I mean that the coefficient path output (model.coef_) is of shape (Mx5) where M is the number of parameters in the regression and 5 is the alpha depth. I would have expected an output with a shape of (M x n_alphas).
In reference to my issue, I am seeing where of those M parameters for example, I get the following:

  • coxnet.CoxnetSurvivalAnalysis(n_alphas=30, l1_ratio=1.0) gives 15 params != 0
  • coxnet.CoxnetSurvivalAnalysis(n_alphas=20, l1_ratio=1.0) gives 20 params != 0
  • coxnet.CoxnetSurvivalAnalysis(n_alphas=40, l1_ratio=1.0) gives 10 params != 0

but in each instance, the model.coef_ output is (Mx5).

Thank you for replying and I apologize if there is some caveat I am missing.

@sebp
Copy link
Owner

sebp commented Aug 4, 2018

I think you are mixing different concepts:

  1. There is the grid of alpha values, which is determined by your data, n_alphas, and alpha_min_ratio. The maximum alpha is chosen such that all variables will have a coefficient of zero as determined from your dataset. The next step is to determine the minimum alpha, which is alpha_min_ratio * alpha_max. Finally, n_alphas different values from alpha_max to alpha_min are chosen equally spaced in log-scale. Therefore, when you modify n_alphas, alpha_max and alpha_min will remain the same, but alphas in between will change.
  2. It can happen that optimization stops early if max_iter has been reached. The coefficients of the remaining alpha values will not be updated and a convergence warning will be displayed.
  3. Usually, the number of non-zero coefficients is increasing if alpha is decreasing. This is not a strict requirement, though. In certain situation if features interact with each other, a coefficient could go back to zero.

@dex314
Copy link
Author

dex314 commented Aug 6, 2018

Yes, I understand how the alpha grid works and I agree with everything you said above, I think I may not be explaining very well and it may be one of those one off issues with the data I am working with.
The fit is not relaying any messages or errors regarding convergence and the way I have built elastic nets in the past is exactly the way you describe (calculated min and max then log scaled between them over 100 alphas).
I had assumed then if I specified n_alphas = 30 then I would get a matrix of M parameters by 30. Like wise if I specified n_alphas = 40 I should get an Mx40 matrix of coefficients and they would have nearly identical paths from the n_alphas=30 model but those same paths converging or diverging over the next 10 alphas.
It is confusing me as to why I am getting more non-zero coefficients on n_alphas=10 (nearly the full solution actually ) and significantly less on n_alphas=40 (very sparse) and both coefficient outputs being Mx5. Its entirely possible its the data I am working with as well.
I thought perhaps there was a special method or criteria you had in place and it just was not displaying any messages and maybe you might know off the top of your head.

@sebp
Copy link
Owner

sebp commented Aug 6, 2018

One possible issue I could imagine when using n_alphas = 10 instead of n_alphas = 100 is that gaps between adjacent alpha values are larger. Hence, moving from one alpha to the next one will result in large updates. The algorithm does not perform step-size optimization. It is possible for updates to over-shot and miss actual minimum. I would recommend to use a relatively dense list of alphas values.

If you want to double check, you can try R's glmnet package, which implements elastic net too.

@dex314
Copy link
Author

dex314 commented Aug 7, 2018

This was a good idea. I checked the same data using glmnet with family='cox'. It stopped at an alpha depth of 52 and the paths look similar when you specify n_alphas=5. When you try something higher like n_alphas=10, the paths looks similar but the python code doesnt return the rest of coefficient matrix and I cant figure out why. For python, n_alphas is the parameter and for R its nlambda. I've seen alpha and lambda interchanged in the past, Im not confusing these in this instance as it relates to your code am I?

Here are Python paths:
image

Here is GLMNET in R:
image

@sebp
Copy link
Owner

sebp commented Aug 9, 2018

You are correct, glmnet's nlambda corresponds to n_alphas.

Are you saying that scikit-survival does not return the full path of 10 alphas, but glmnet does? Could you plot the individual estimates as dots in the plots, in addition to lines, please.

@dex314
Copy link
Author

dex314 commented Aug 10, 2018

Yes, in this particular instance, it is not returning the full path of alphas no matter what n_alphas I specify. I know when I first opened the issue I was all over the place, but this is definitely the main point of my confusion. It makes me think something is inadvertently defaulting somehow within the code as I did not change anything in the code itself.

GLMNET
image

SKSURV
image

@plpxsk
Copy link
Contributor

plpxsk commented Aug 20, 2018

One quick thought: in sksurv model call, can you check alpha_min_ratio and perhaps decrease it? Then perhaps you may get the longer paths seen in glmnet?

@dex314
Copy link
Author

dex314 commented Aug 29, 2018

Sorry for the delay in response. I tried your suggestion but it did not work. It seems like a unique issue and in the end, with the way glmnet is built, I can still get a sparse solution relative to the shorter paths. Additionally, the selected variables seem intuitive and appropriate.

@hermidalc
Copy link
Contributor

hermidalc commented Mar 14, 2020

I have a feeling this might be similar to the issue or confusion I’ve been having related to alphas that I commented on in #47.

I’ve found that Coxnet will silently not use all the alphas down the autogenerated sequence once the alpha values gets too small, but it won’t raise any warnings or errors during the fit.

For example, it might calculate an alpha max of 1.5 from the data and with an alpha_min_ratio set to 0.01 it will create the alphas_ sequence of n_alphas alphas from 1.5 down to 0.015. When it does the fit it doesn’t typically use all the alphas down the sequence and this seems to be normal behavior. It doesn’t show any convergence warnings.

I only realized this when I was trying to do model selection/CV based on the gist example and got Numerical error... consider increasing alpha errors when it was fitting individual alphas from the autogenerated sequence from initial fit on the data I did to generate the sequence.

@dex314 I would consider increasing alpha_min_ratio so that the sequence of alphas don’t become too small and maybe you will see it uses more of them and more alphas are shown in coef_

Also @sebp it might be a good idea to have two default values for alpha_min_ratio like glmnet has, 1e-4 when n_features < n_samples, and 1e-2 when n_features > n_samples.

@sebp
Copy link
Owner

sebp commented Mar 20, 2020

Also @sebp it might be a good idea to have two default values for alpha_min_ratio like glmnet has, 1e-4 when n_features < n_samples, and 1e-2 when n_features > n_samples.

That's a good idea, would you be able to provide a pull request with this change?

sebp added a commit that referenced this issue Apr 11, 2020
The default value of alpha_min_ratio will
depend on the sample size relative to the
number of features in 0.13.
If `n_samples > n_features`, the current
default value 0.0001 will be used.
If `n_samples < n_features`, 0.01 will be
used instead.

See #41 (comment)
@sebp
Copy link
Owner

sebp commented Apr 11, 2020

The default value for alpha_min_ratio will depend in n_features and n_samples in a future release. I added a warning to notify users about this change (see commit dfd645e)

@Sann5
Copy link

Sann5 commented Oct 9, 2024

Hello scikit-survival team. Thanks for the awesome package. Is there a solution for this yet? Essentially I would like coxnet to not drop the small alphas in my regularization paths, e.g. to fit models for all the alphas I specify in the parameter alphas.

Sometimes I will specify 100 alphas, it would only use the first 6. Funny enough always only 6. I can provide you with example code if need be.

@Sann5 Sann5 linked a pull request Oct 21, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants