-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast approximation of biglasso? #12
Comments
You probably want to start with the STRONG rules which eliminates regressors that don't project well onto the response based on the penalty. You can then apply the lasso on the remaining regressors. If you still have too much data you can trade off estimator accuracy for computational complexity either by reducing the numerical accuracy of the slope coefficients in the standard implementation or by using ADMM. I personally prefer the former is probably faster when the data randomly distributed over concurrent partitions. |
Thanks for the tips. For now, I have not the time to test this, but hopefully I will someday. |
I need to implement this for a book I'm writing. If you can wait a few weeks then I can provide a reference implementation. |
Strong rules are implemented in the biglasso package of @YaohuiZeng. I'm also using the code in this package. I'm looking forward to seeing your implementation. |
The crux STRONG is checking the KKT conditions. Below is reference code similar to what I have in the book chapter to do this. Note that if you were going to optimize for performance, you'd probably want the vector of slope cefficients Also, note that you'd probably want to change
|
Now much faster with #14 |
Find if there is a fast near-optimal rule approximation for computing multivariate linear/logistic regression on biobank-scale datasets in a few hours (or minutes).
The text was updated successfully, but these errors were encountered: