You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recommend a workflow of keeping a Pooch as a global variable by calling pooch.create at import time. This is problematic when running in parallel and the cache folder has not been created yet. Each parallel process/thread will try to create it at the same time leading to a crash.
For our users, this is particularly bad since their packages crash on import even if there is no use of the sample datasets.
Not sure exactly what we can do on our end except:
Add a warning in the docs about this issue.
Change our recommended workflow to wrapping pooch.create in a function. This way the crash wouldn't happen at import but only when trying to fetch a dataset in parallel. This gives packages a chance to warn users not to load sample data for the first time in parallel.
Perhaps an ideal solution would be for Pooch to create the cache folder at install time. But that would mean hooking into setup.py (which is complicated) and I don't know if we could even do this with conda.
We might also want to have tests in place for running some things in parallel. It will certainly help finding more places where this comes up (though dealing with parallel IO will not be fun).
Add `exist_ok` to `os.makedirs` in case the cache folder is being created
in parallel (multiple jobs trying to call `makedirs` at the same time).
Includes a test case with threads and processes to make sure it works.
Fixes#170 and fixes#150
Description of the problem
We recommend a workflow of keeping a
Pooch
as a global variable by callingpooch.create
at import time. This is problematic when running in parallel and the cache folder has not been created yet. Each parallel process/thread will try to create it at the same time leading to a crash.For our users, this is particularly bad since their packages crash on
import
even if there is no use of the sample datasets.Not sure exactly what we can do on our end except:
pooch.create
in a function. This way the crash wouldn't happen atimport
but only when trying to fetch a dataset in parallel. This gives packages a chance to warn users not to load sample data for the first time in parallel.Perhaps an ideal solution would be for Pooch to create the cache folder at install time. But that would mean hooking into
setup.py
(which is complicated) and I don't know if we could even do this with conda.Full code that generated the error
See scikit-image/scikit-image#4660
The text was updated successfully, but these errors were encountered: