Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed block blob limits #199

Merged
merged 1 commit into from
Aug 5, 2020
Merged

Changed block blob limits #199

merged 1 commit into from
Aug 5, 2020

Conversation

mohsha-msft
Copy link
Contributor

No description provided.

@mohsha-msft mohsha-msft merged commit 9e15f04 into dev Aug 5, 2020
@mohsha-msft mohsha-msft deleted the issue/minor-jumbo-blob-fix branch August 5, 2020 05:40
mohsha-msft added a commit that referenced this pull request Aug 31, 2020
nakulkar-msft added a commit that referenced this pull request Oct 27, 2020
* Add illumos build tag additionally to solaris

* #7508079 [Go][Blob][2019-12-12] Blob Versioning (#190)

* Generated code for 12-12-2019 spec

* Fix test

* Changes

* Basic Testing and modification in WithVersionId function.

* Added Tags and Versions in BlobListingDetails.

* Added Tests

* Added TestCases

* Commented out tests which require versioning disabled.

* Added Tests

* Testcases 1-on-1 with python SDK

* Moved all tests to same file for ease of accessibility

Co-authored-by: zezha-msft <[email protected]>

* update to go1.14

* Minor Jumbo Blob Fix and Blob Versioning fix (#198)

* Minor Jumbo Blob fix + versioning fix

* Test Case Fix

* Renamed struct back to original

* Changed block blob limit (#199)

* Minor versioning fix (#200)

* [Go][Blob][2019-02-02] Set tier support on copy/put blob API (#203)

* Added tier parameter in upload block blob function signature + Fixed usage + Wrote a test case for validation.

* Added tier parameter in
a. CopyFromURL, CommitBlockList of Block Blob
b. Create (Page Blob)
Fixed all occurrence

* Minor Change

* Added test

* Rev go to 1.15, adal to 0.9.2 (#205)

Update go to latest version
Update adal dependency

* #7508079 [Go][Blob][2019-12-12] Blob Versioning (#190)

* Generated code for 12-12-2019 spec

* Fix test

* Changes

* Basic Testing and modification in WithVersionId function.

* Added Tags and Versions in BlobListingDetails.

* Added Tests

* Added TestCases

* Commented out tests which require versioning disabled.

* Added Tests

* Testcases 1-on-1 with python SDK

* Moved all tests to same file for ease of accessibility

Co-authored-by: zezha-msft <[email protected]>

* Minor Jumbo Blob Fix and Blob Versioning fix (#198)

* Minor Jumbo Blob fix + versioning fix

* Test Case Fix

* Renamed struct back to original

* Changed block blob limit (#199)

* update to go1.14

* Minor versioning fix (#200)

* [Go][Blob][2019-02-02] Set tier support on copy/put blob API (#203)

* Added tier parameter in upload block blob function signature + Fixed usage + Wrote a test case for validation.

* Added tier parameter in
a. CopyFromURL, CommitBlockList of Block Blob
b. Create (Page Blob)
Fixed all occurrence

* Minor Change

* Added test

* Rev go to 1.15, adal to 0.9.2 (#205)

Update go to latest version
Update adal dependency

* Fixing BlockBlobMaxUploadBlobBytes  value (#207)

Reverting BlockBlobMaxUploadBlobBytes to 256MB

* Consider 502 as a temporary error (#204)

* [highlevel] Stop using memory-mapped files

While investigating this SDK for uploading and downloading large blobs
(e.g. 25GB or more) it became apparent that the memory-mapped approach
has some severe limitations:

1. Limits the file size on 32-bit systems (theoretically 4GB, but much
   less in practice).
2. Has no backpressure when writing to slower storage mediums.
3. Appears to finish faster, but the OS spends several minutes flushing
   the modified RAM to disk afterwards (depends on the speed of the
   disk).

On a VM with 16GB of RAM and a slow disk (spinning in this case) the
algorithm quickly overwhelms the available memory and causes severe
performance degradation. It ended up simultaneously trying to flush to
the slow data disk and page out to the slightly faster OS disk.

The solution is to stop using memory-mapped files (at least the way the
SDK currently uses then) and switch to the `io.ReaderAt` and
`io.WriterAt` interfaces. They explicitly allow for parallel access to
non-overlapping regions which make them a good candidate for this
purpose.

Benchmarking large downloads (25GB file) between azcopy 10.4.3 and these
updates using a test app, the difference between them is within 10
seconds. When compared against the original code on a beefy machine with
plenty of RAM the measured execution time is faster, but there is a
little bit of delay while the last of the data flushes from RAM to disk.

* PR feedback

Co-authored-by: Till Wegmueller <[email protected]>
Co-authored-by: Ze Qian Zhang <[email protected]>
Co-authored-by: Mohit Sharma <[email protected]>
Co-authored-by: Jonas-Taha El Sesiy <[email protected]>
Co-authored-by: mohsha-msft <[email protected]>
Co-authored-by: Kyle Farnung <[email protected]>
nakulkar-msft added a commit that referenced this pull request Dec 24, 2020
* Add illumos build tag additionally to solaris

* #7508079 [Go][Blob][2019-12-12] Blob Versioning (#190)

* Generated code for 12-12-2019 spec

* Fix test

* Changes

* Basic Testing and modification in WithVersionId function.

* Added Tags and Versions in BlobListingDetails.

* Added Tests

* Added TestCases

* Commented out tests which require versioning disabled.

* Added Tests

* Testcases 1-on-1 with python SDK

* Moved all tests to same file for ease of accessibility

Co-authored-by: zezha-msft <[email protected]>

* update to go1.14

* Minor Jumbo Blob Fix and Blob Versioning fix (#198)

* Minor Jumbo Blob fix + versioning fix

* Test Case Fix

* Renamed struct back to original

* Changed block blob limit (#199)

* Minor versioning fix (#200)

* [Go][Blob][2019-02-02] Set tier support on copy/put blob API (#203)

* Added tier parameter in upload block blob function signature + Fixed usage + Wrote a test case for validation.

* Added tier parameter in
a. CopyFromURL, CommitBlockList of Block Blob
b. Create (Page Blob)
Fixed all occurrence

* Minor Change

* Added test

* Rev go to 1.15, adal to 0.9.2 (#205)

Update go to latest version
Update adal dependency

* #7508079 [Go][Blob][2019-12-12] Blob Versioning (#190)

* Generated code for 12-12-2019 spec

* Fix test

* Changes

* Basic Testing and modification in WithVersionId function.

* Added Tags and Versions in BlobListingDetails.

* Added Tests

* Added TestCases

* Commented out tests which require versioning disabled.

* Added Tests

* Testcases 1-on-1 with python SDK

* Moved all tests to same file for ease of accessibility

Co-authored-by: zezha-msft <[email protected]>

* Minor Jumbo Blob Fix and Blob Versioning fix (#198)

* Minor Jumbo Blob fix + versioning fix

* Test Case Fix

* Renamed struct back to original

* Changed block blob limit (#199)

* update to go1.14

* Minor versioning fix (#200)

* [Go][Blob][2019-02-02] Set tier support on copy/put blob API (#203)

* Added tier parameter in upload block blob function signature + Fixed usage + Wrote a test case for validation.

* Added tier parameter in
a. CopyFromURL, CommitBlockList of Block Blob
b. Create (Page Blob)
Fixed all occurrence

* Minor Change

* Added test

* Rev go to 1.15, adal to 0.9.2 (#205)

Update go to latest version
Update adal dependency

* Fixing BlockBlobMaxUploadBlobBytes  value (#207)

Reverting BlockBlobMaxUploadBlobBytes to 256MB

* Consider 502 as a temporary error (#204)

* [highlevel] Stop using memory-mapped files

While investigating this SDK for uploading and downloading large blobs
(e.g. 25GB or more) it became apparent that the memory-mapped approach
has some severe limitations:

1. Limits the file size on 32-bit systems (theoretically 4GB, but much
   less in practice).
2. Has no backpressure when writing to slower storage mediums.
3. Appears to finish faster, but the OS spends several minutes flushing
   the modified RAM to disk afterwards (depends on the speed of the
   disk).

On a VM with 16GB of RAM and a slow disk (spinning in this case) the
algorithm quickly overwhelms the available memory and causes severe
performance degradation. It ended up simultaneously trying to flush to
the slow data disk and page out to the slightly faster OS disk.

The solution is to stop using memory-mapped files (at least the way the
SDK currently uses then) and switch to the `io.ReaderAt` and
`io.WriterAt` interfaces. They explicitly allow for parallel access to
non-overlapping regions which make them a good candidate for this
purpose.

Benchmarking large downloads (25GB file) between azcopy 10.4.3 and these
updates using a test app, the difference between them is within 10
seconds. When compared against the original code on a beefy machine with
plenty of RAM the measured execution time is faster, but there is a
little bit of delay while the last of the data flushes from RAM to disk.

* PR feedback

Co-authored-by: Till Wegmueller <[email protected]>
Co-authored-by: Ze Qian Zhang <[email protected]>
Co-authored-by: Mohit Sharma <[email protected]>
Co-authored-by: Jonas-Taha El Sesiy <[email protected]>
Co-authored-by: mohsha-msft <[email protected]>
Co-authored-by: Kyle Farnung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants