Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify loop in initial assignments of lda to use sparse structure. #213

Merged
merged 1 commit into from
Jul 7, 2020
Merged

Modify loop in initial assignments of lda to use sparse structure. #213

merged 1 commit into from
Jul 7, 2020

Conversation

jmoralez
Copy link
Contributor

Hi, thanks for writing this awesome package, it really helped me grasp the idea of the collapsed gibbs sampler. Here's my attempt to give back to it.

The current implementation of the initial assignments of LDA iterates through the document-term matrix by rows and not taking into account the sparse nature of it, which makes it very slow in some circumstances (~50 minutes for a 800,000 x 20,000 case). I've modified the loop to exploit the sparse structure of the matrix by iterating through the non-zero rows of each column, this achieves a substantial improvement (the 800,000 x 20,00 case goes down to ~2 minutes).

@aviks aviks merged commit ec4c1e8 into JuliaText:master Jul 7, 2020
@jmoralez jmoralez deleted the lda_init branch July 9, 2020 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants