You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parametric UMAP stores multiple copies of the full input data, but these are unnecessary for transforming new data points. By deleting self._raw_data and self._knn_search_index._raw_data from my Parametric UMAP model object, I was able to reduce the size of the saved model from 90 GB to 300 MB (the input data is a distance matrix with 80K locations). This might not work for models that require additional training, but perhaps should be an option when model size is an issue?
The text was updated successfully, but these errors were encountered:
On Thu, May 2, 2024 at 5:37 PM Brad Nelson ***@***.***> wrote:
Parametric UMAP stores multiple copies of the full input data, but these
are unnecessary for transforming new data points. By deleting
self._raw_data and self._knn_search_index._raw_data from my Parametric
UMAP model object, I was able to reduce the size of the saved model from 90
GB to 300 MB (the input data is a distance matrix with 80K locations). This
might not work for models that require additional training, but perhaps
should be an option when model size is an issue?
—
Reply to this email directly, view it on GitHub
<#1118>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJYKBWB42UCIRPIVNCSIJLZAKWZJAVCNFSM6AAAAABHEPKJE6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TMNJUGUYDQNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Tim Sainburg <https://timsainburg.com/>
Postdoctoral Fellow
Harvard Medical School
814.574.7780, ***@***.***
I'm having the same issue, where I want the trained model to be as small as possible (the inference machine does not have as much memory as the training machine). I'll link a PR where I added a parameter to remove the raw data to the save method.
Parametric UMAP stores multiple copies of the full input data, but these are unnecessary for transforming new data points. By deleting
self._raw_data
andself._knn_search_index._raw_data
from my Parametric UMAP model object, I was able to reduce the size of the saved model from 90 GB to 300 MB (the input data is a distance matrix with 80K locations). This might not work for models that require additional training, but perhaps should be an option when model size is an issue?The text was updated successfully, but these errors were encountered: