Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skyplane is slow for large number of small files #841

Open
sarahwooders opened this issue May 10, 2023 · 1 comment
Open

Skyplane is slow for large number of small files #841

sarahwooders opened this issue May 10, 2023 · 1 comment

Comments

@sarahwooders
Copy link
Contributor

Currently, for large number of small files, Skyplane is bottlenecks on the chunk dispatch because the file listing is much slower than Skyplane's ability to transfer the data

Proposed solution:

  • Use multiple processes/threads to list objects in parallel (this can be done by listing random prefixes)
  • Use aync HTTP requests to send chunk requests, rather than waiting
sarahwooders added a commit that referenced this issue May 15, 2023
Implements a few bug fixes causing errors for large transfers: 
* A basic backpressure mechanism, so that if the queues on a gateway are
full, the `chunk_requests` POST request will return how many chunks were
added and the current queue size, informing the HTTP client making the
request to send the remaining chunks (those not added) to a different
gateway or to wait and try again. With this change, I was able to
transfer 1TB.
* This also reduces the *total* number of HTTP connections per gateway
to be 64, as opposed to 32 per destination, which seems to have been
causing issues.
* Empty chunks are allowed, since object stores can have empty folders
which we still want transferred

There are still issues for SSH connections for long running transfers,
and listing files can take an extremely long time on the client (#841),
so these issues need to be fixed to for very large transfers.
@Zorlin
Copy link

Zorlin commented Jun 24, 2023

Have a look at S3P if you haven't already; it seems to have implemented a partial strategy for this that could be good to take inspiration from. This is a HUGE use case for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants