Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic modelling using RAPIDS and BERT #41

Merged
merged 53 commits into from
Dec 21, 2021

Conversation

mayankanand007
Copy link
Contributor

Workflow to read a set of documents and extract topics from it leveraging BERT, TF-IDF and NVIDIA RAPIDS.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @mayankanand007 . Work looks great.

I have done an inital review and requested changes.

cuBERT_topic_modelling/README.md Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_umap_dr.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
@VibhuJawa VibhuJawa added the Waiting on Author Waiting on Author label Nov 19, 2021
@mayankanand007
Copy link
Contributor Author

couple of things left, but I think this is good for another round of review 😄 as a lot of code has been refactored.

@mayankanand007 mayankanand007 removed the Waiting on Author Waiting on Author label Nov 26, 2021
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this. Have requested additional changes.

cuBERT_topic_modelling/README.md Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_ctfidf.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_subwordtokenizer.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_subwordtokenizer.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_subwordtokenizer.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_umap_dr.py Outdated Show resolved Hide resolved
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the reviews The PR looks close, have requested some more changes.

Also, ensure that you have run flake8/black on the test python files aswell.

cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/cuBERTopic.py Outdated Show resolved Hide resolved
cuBERT_topic_modelling/tests/test_ctfidf.py Show resolved Hide resolved
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pushing all these changes through @mayankanand007 as well as testing it on the whole dataset.

I think we are close to completion, have requested very minor changes but we should be good to merge soon.

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Mayank. LGTM

@VibhuJawa VibhuJawa merged commit ce24e3d into rapidsai:main Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants