You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
Thank you for the tool.
I am dealing with very big dataset and trying to reduce memory requirements. I tried setting low_memory to True, parametric umap, PCA reduction and other stuff but still memory requirements are too high for my purposes.
I am working with features from different samples. Each sample may contain more than 10000 unique features.
Now my idea is:
Cluster features within samples, each cluster should contain at least 100 features.
Select representative feature for each cluster (I have algorithm for that based on fearure properties), or several features.
Cluster representative features, pooling all samples, each cluster can be considered as cluster even with 1 feature.
Reassign features from step 1 to clusters from step 3.
In that way I am hopping to deal with memory consumption.
Could you please give me your opinion on that approach? Like "better not to do it" or "may work"?
Best,
The text was updated successfully, but these errors were encountered:
Hello!
Thank you for the tool.
I am dealing with very big dataset and trying to reduce memory requirements. I tried setting low_memory to True, parametric umap, PCA reduction and other stuff but still memory requirements are too high for my purposes.
I am working with features from different samples. Each sample may contain more than 10000 unique features.
Now my idea is:
In that way I am hopping to deal with memory consumption.
Could you please give me your opinion on that approach? Like "better not to do it" or "may work"?
Best,
The text was updated successfully, but these errors were encountered: