Cluster clusters #1121

timyerg · 2024-05-14T17:18:50Z

Hello!
Thank you for the tool.
I am dealing with very big dataset and trying to reduce memory requirements. I tried setting low_memory to True, parametric umap, PCA reduction and other stuff but still memory requirements are too high for my purposes.
I am working with features from different samples. Each sample may contain more than 10000 unique features.
Now my idea is:

Cluster features within samples, each cluster should contain at least 100 features.
Select representative feature for each cluster (I have algorithm for that based on fearure properties), or several features.
Cluster representative features, pooling all samples, each cluster can be considered as cluster even with 1 feature.
Reassign features from step 1 to clusters from step 3.

In that way I am hopping to deal with memory consumption.

Could you please give me your opinion on that approach? Like "better not to do it" or "may work"?

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster clusters #1121

Cluster clusters #1121

timyerg commented May 14, 2024 •

edited

Loading

Cluster clusters #1121

Cluster clusters #1121

Comments

timyerg commented May 14, 2024 • edited Loading

timyerg commented May 14, 2024 •

edited

Loading