Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Gcs's RangeWrite doesn't support concurrent write #4806

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Jun 25, 2024

We need to find a new way to implement concurrent write for gcs.

@github-actions github-actions bot requested review from G-XD and morristai June 25, 2024 17:06
@Xuanwo Xuanwo requested review from sundy-li, suyanhanx, WenyXu and PsiACE and removed request for morristai and G-XD June 25, 2024 17:06
Copy link
Member

@WenyXu WenyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥲Interesting, I didn't notice it before...

@@ -362,7 +362,7 @@ impl Access for GcsBackend {
// It's recommended that you use at least 8 MiB for the chunk size.
//
// Reference: [Perform resumable uploads](https://cloud.google.com/storage/docs/performing-resumable-uploads)
write_multi_align_size: Some(256 * 1024 * 1024),
write_multi_align_size: Some(8 * 1024 * 1024),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it with no change?

Copy link
Member Author

@Xuanwo Xuanwo Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it with no change?

No, the original value is incorrect (256MiB! instead of 256KiB) and has been hiding our CI failure for a long time. All our tests use write_once in wrong, which allows our CI to keep passing.

@WenyXu
Copy link
Member

WenyXu commented Jun 25, 2024

It seems we can use this API: https://cloud.google.com/storage/docs/multipart-uploads, which allows us to upload parts in parallel.

You can upload parts simultaneously, reducing the time it takes to upload the data in its entirety.

@Xuanwo Xuanwo merged commit d58dfb7 into main Jun 25, 2024
245 of 248 checks passed
@Xuanwo Xuanwo deleted the fix-gcs-concurrent-write branch June 25, 2024 17:21
@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 25, 2024

It seems we can use this API: https://cloud.google.com/storage/docs/multipart-uploads, which allows us to upload parts in parallel.

I tracked this at #4807, would you like to help implementing it?

@WenyXu
Copy link
Member

WenyXu commented Jun 25, 2024

It seems we can use this API: https://cloud.google.com/storage/docs/multipart-uploads, which allows us to upload parts in parallel.

I tracked this at #4807, would you like to help implementing it?

Yes, let me fix it🤩

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 25, 2024

Yes, let me fix it🤩

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants