Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constrained ALS and ALM #16

Open
debasish83 opened this issue Jun 17, 2015 · 1 comment
Open

Constrained ALS and ALM #16

debasish83 opened this issue Jun 17, 2015 · 1 comment

Comments

@debasish83
Copy link

@witgo

I have a package for factorization that's based on ml.recommendation.ALS but several major changes:

  1. For ALS, user and product constraints can be specified. This allows us to add column wise L2 regularization for words and L1 regularization for documents (through Breeze QuadraticMinimizer) to run sparse coding.
  2. In place of L1 regularization, probability simplex can be added on documents and positive constraints on words to get PLSA constraints with least square loss.
  3. Alternating Minimization supports KL Divergence and likelihood loss with positive constraints in matrix factorization to run PLSA formulation and generate LDA results through factorization.
  4. Alternating minimization shuffles sparse vectors and is designed to scale to large ranks matrix factorization like petuum.

Details are on the following JIRAs:

  1. https://issues.apache.org/jira/browse/SPARK-2426
  2. https://issues.apache.org/jira/browse/SPARK-6323

If it looks useful, I can add a factorization package in zen and bring the code from the Spark PRs. zen is already in spark-packages and so I don't have to introduce another new package. If users find it useful, may be later we can move it back to ml. It changes user facing API significantly.

Next I want to move these algorithms to graphx API and compare the runtime and efficiency. Since zen is focused on optimizing graphx for ML, I feel zen is an ideal package for these factorization algorithms.

Factorization output are large distributed models and natural extension is to add few hidden layers between user/word and item/document and develop a distributed neural net formulation which should use optimized graphx API and I think you have already built many of these optimizations in zen.

@hucheng
Copy link
Contributor

hucheng commented Jun 19, 2015

@debasish83
It would be great if you can contribute to Zen. We agreed that GraphX is suitable for factorization.
Please feel free to propose PRs and we can discuss them one by one.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants