You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When try to copy a local path to s3 remote filesystem using pyarrow.fs.copy_files and using default parameter use_threads=True, the system hangs. If use "use_threads=False` the operation must complete ok (but more slow).
We experience the same issue in Ray, and it's easily reproducible. The issue comes up when requesting a recursive upload of more or equal files than CPUs are available.
For instance, on my MacBook with 8 cores, I can upload a folder with 7 files, but not with 8 files:
mkdir -p /tmp/pa-s3
cd /tmp/pa-s3
for i in {1..7}; do touch $i.txt; done
# This works
python -c "import pyarrow.fs; pyarrow.fs.copy_files('/tmp/pa-s3', 's3://bucket/folder')"
for i in {1..8}; do touch $i.txt; done
# This hangs forever
python -c "import pyarrow.fs; pyarrow.fs.copy_files('/tmp/pa-s3', 's3://bucket/folder')"
The problem comes up at least with pyarrow 6-11 and can be avoided with use_threads=False, but this obviously harms performance.
When try to copy a local path to s3 remote filesystem using
pyarrow.fs.copy_files
and using default parameteruse_threads=True
, the system hangs. If use "use_threads=False` the operation must complete ok (but more slow).My code is:
If check remote s3, all files appear, but the function don't return
Platform: Windows
Reporter: Alejandro Marco Ramos
Note: This issue was originally created as ARROW-17064. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: