Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster clusters #1121

Open
timyerg opened this issue May 14, 2024 · 0 comments
Open

Cluster clusters #1121

timyerg opened this issue May 14, 2024 · 0 comments

Comments

@timyerg
Copy link

timyerg commented May 14, 2024

Hello!
Thank you for the tool.
I am dealing with very big dataset and trying to reduce memory requirements. I tried setting low_memory to True, parametric umap, PCA reduction and other stuff but still memory requirements are too high for my purposes.
I am working with features from different samples. Each sample may contain more than 10000 unique features.
Now my idea is:

  1. Cluster features within samples, each cluster should contain at least 100 features.
  2. Select representative feature for each cluster (I have algorithm for that based on fearure properties), or several features.
  3. Cluster representative features, pooling all samples, each cluster can be considered as cluster even with 1 feature.
  4. Reassign features from step 1 to clusters from step 3.

In that way I am hopping to deal with memory consumption.

Could you please give me your opinion on that approach? Like "better not to do it" or "may work"?

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant