-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: module 'cuml.cluster.hdbscan' has no attribute 'all_points_membership_vectors' #912
Comments
Just to confirm, same here with bertopic==0.13.0. |
Which version of cuML are you using? Also, could you share your entire code for training the model? That makes it a bit easier to see what exactly is going on. |
Also, I believe when using the original HDBSCAN model, you will need to set |
When using the original HDBSCAN with prediction_data=True it actually works, thank you. For the cuML part: The code for training: from cuml.cluster import HDBSCAN stopwords = list(stopwords.words('english')) umap_model = UMAP(n_neighbors=9, n_components=4, min_dist=0.05, random_state=42) embeddings = embedding_model.encode(docs, show_progress_bar=True) model_bert = BERTopic( topics, probs = model_bert.fit_transform(docs, embeddings) And just a side question: |
Ah, you will need to have 22.10 at the very least in order to use those probabilities. I definitely should have made that clear in the documentation. Having said that, you can also use Google Colab using the instructions here.
That depends on the parameter space that you are using compared with the original. I believe they are not exactly one on one comparable so making sure all parameters are equal should help a bit. |
Thank you very much for your fast help! I will try it on Google Colabs in the next days. Edit: It worked with Google Colabs! Thank you! |
I get a similar error message with bertopic==0.13 connected with all_points_membership_vectors
|
@p-dre I should update the documentation but the error message already gives you a hint as to what should be changed. In order to generate those probabilities, you should set hdbscan_model = HDBSCAN(min_samples=10, min_cluster_size=10, gen_min_span_tree=True, prediction_data=True) |
* Add representation models * bertopic.representation.KeyBERTInspired * bertopic.representation.PartOfSpeech * bertopic.representation.MaximalMarginalRelevance * bertopic.representation.Cohere * bertopic.representation.OpenAI * bertopic.representation.TextGeneration * bertopic.representation.LangChain * bertopic.representation.ZeroShotClassification * Fix topic selection when extracting repr docs * Improve documentation, #769, #954, #912 * Add wordcloud example to documentation * Add title param for each graph, #800 * Improved nr_topics procedure * Fix #952, #903, #911, #965. Add #976
Hi Maarten,
after updating to the new version 0.13.0 a new error occured in my code:
I have read in the changelog that you have made changes to support cuML' hdbscan, which I am using. When using the "normal" hdbscan package, I get the following error:
I instantiated the hdbscan model like this: hdbscan_model = HDBSCAN(min_cluster_size=20, min_samples=20, gen_min_span_tree=True)
When I switch back to version 0.12.0 I get no errors and everything runs as it should. Is there a problem on my end or is this behaviour not intended?
All the best,
Dominik
The text was updated successfully, but these errors were encountered: