You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a package for factorization that's based on ml.recommendation.ALS but several major changes:
For ALS, user and product constraints can be specified. This allows us to add column wise L2 regularization for words and L1 regularization for documents (through Breeze QuadraticMinimizer) to run sparse coding.
In place of L1 regularization, probability simplex can be added on documents and positive constraints on words to get PLSA constraints with least square loss.
Alternating Minimization supports KL Divergence and likelihood loss with positive constraints in matrix factorization to run PLSA formulation and generate LDA results through factorization.
Alternating minimization shuffles sparse vectors and is designed to scale to large ranks matrix factorization like petuum.
If it looks useful, I can add a factorization package in zen and bring the code from the Spark PRs. zen is already in spark-packages and so I don't have to introduce another new package. If users find it useful, may be later we can move it back to ml. It changes user facing API significantly.
Next I want to move these algorithms to graphx API and compare the runtime and efficiency. Since zen is focused on optimizing graphx for ML, I feel zen is an ideal package for these factorization algorithms.
Factorization output are large distributed models and natural extension is to add few hidden layers between user/word and item/document and develop a distributed neural net formulation which should use optimized graphx API and I think you have already built many of these optimizations in zen.
The text was updated successfully, but these errors were encountered:
@debasish83
It would be great if you can contribute to Zen. We agreed that GraphX is suitable for factorization.
Please feel free to propose PRs and we can discuss them one by one.
@witgo
I have a package for factorization that's based on ml.recommendation.ALS but several major changes:
Details are on the following JIRAs:
If it looks useful, I can add a factorization package in zen and bring the code from the Spark PRs. zen is already in spark-packages and so I don't have to introduce another new package. If users find it useful, may be later we can move it back to ml. It changes user facing API significantly.
Next I want to move these algorithms to graphx API and compare the runtime and efficiency. Since zen is focused on optimizing graphx for ML, I feel zen is an ideal package for these factorization algorithms.
Factorization output are large distributed models and natural extension is to add few hidden layers between user/word and item/document and develop a distributed neural net formulation which should use optimized graphx API and I think you have already built many of these optimizations in zen.
The text was updated successfully, but these errors were encountered: