Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new JournalingBlobWriteSessionConfig usable with gRPC transport #2194

Merged
merged 2 commits into from
Sep 12, 2023

Conversation

BenWhitehead
Copy link
Collaborator

@BenWhitehead BenWhitehead commented Sep 6, 2023

Overview

New BlobWriteSessionConfig that will use disk to "Journal" data before transmitting to GCS.

By journaling the data to disk we can rewind to an arbitrary offset in the case of failure, while being optimistic and keeping the stream open to gcs.

General flow of data into this new WritableByteChannel:

  1. #write(ByteBuffer)
  2. Write contents of ByteBuffer to recovery file on disk
  3. force a flush to disk
  4. transmit contents of ByteBuffer to GCS leaving stream open
  5. return

If the stream to gcs is interrupted with a retriable error

  1. query the offset that was successfully committed to GCS
  2. rewind our transmit context to the offset
  3. Open the recovery file at offset from query
  4. stream contents from recovery file to GCS leaving stream open
  5. Once recovery file contents are transmitted to GCS return

Benchmark results

Setup

  1. c2d-standard-32 in us-east1
    • with 4 local nvme sdd used for the recovery files
    • default NIC, premium network tier
    • debian 11 image
  2. regional bucket in us-east1

Workload

generate a random file with size 1GiB..2GiB upload random file to GCS using each of the following configurations:

  1. DefaultBlobWriteSessionConfig with 16MiB chunk size
  2. DefaultBlobWriteSessionConfig with 64MiB chunk size
  3. BufferToDiskThenUpload with the following directories as buffer locations: /mnt/ssd1:/mnt/ssd2:/mnt/ssd3:/mnt/ssd4
  4. JournalingBlobWriteSessionConfig with the following directories as journal locations: /mnt/ssd1:/mnt/ssd2:/mnt/ssd3:/mnt/ssd4

Run across {1,2,4,8,16,32} concurrent threads to evaluate contention and horizontal scaling.

Collected metrics

  1. Generate random file
  2. record begin instant
  3. create new BlobWriteSession
  4. BlobWriteSession#open()
  5. Copy all bytes from random file to WritableByteChannel from 4
  6. close the WritableByteChannel from 4
  7. record end instant
  8. report objectSize,UploadStrategy,elapsedTimeUs,Status,ThreadCount

Results summary

Throuhgput in MiB/s, grouped by ThreadCount and UploadStrategy

                             count     mean     std     min      50%      75%      90%      99%      max
ThreadCount UploadStrategy
1           PUT 16MiB       4341.0   79.941   8.752  21.599   80.218   85.628   90.710   99.635  106.627
            PUT 64MiB       4341.0  100.410  11.555  20.490  100.022  108.208  115.214  128.251  139.710
            journaling      4341.0  125.820  31.332  45.502  133.590  149.027  161.899  188.716  201.938
2           PUT 16MiB       4237.0   80.000   8.525  15.814   80.693   85.651   90.241   97.958  106.677
            PUT 64MiB       4237.0  101.062  11.030  55.813  101.049  108.007  115.114  127.299  135.149
            journaling      4237.0  125.010  29.761  43.207  131.827  147.362  159.425  182.441  209.000
4           PUT 16MiB       4411.0   79.708   8.357  40.691   80.600   85.567   89.586   95.533  103.506
            PUT 64MiB       4411.0  100.536   9.947  58.084  100.846  107.209  113.144  122.172  131.974
            journaling      4411.0  123.705  30.707  40.082  130.553  147.581  159.995  186.684  222.646
8           PUT 16MiB       4260.0   79.314   8.393   7.148   80.153   85.175   89.319   95.475  100.757
            PUT 64MiB       4260.0   99.913  10.438  60.685  100.450  107.144  112.551  122.409  132.130
            journaling      4260.0  122.931  30.261  42.747  130.306  146.098  158.005  184.798  203.696
16          PUT 16MiB       4473.0   77.735   8.091  24.149   78.483   83.123   87.092   95.740  106.176
            PUT 64MiB       4473.0   97.690   9.987  45.342   97.768  103.996  109.807  122.202  140.906
            journaling      4473.0  118.956  30.486  44.253  122.585  143.344  156.484  182.211  200.777
32          PUT 16MiB       4024.0   72.923   8.045  20.205   73.601   78.575   82.341   88.970  100.665
            PUT 64MiB       4024.0   93.151  10.030  20.913   93.506   99.748  105.297  116.163  128.284
            journaling      4024.0  104.557  30.965  11.785   98.794  129.923  146.747  174.618  200.303

_Pre-Work_

When performing incremental disk based buffering we need to know the data has been fsync'd to disk before we yield and move forward. Add new SyncingFileChannel which decorates a FileChannel to force a sync each time write(ByteBuffer) is called.
@BenWhitehead BenWhitehead added the owlbot:ignore instruct owl-bot to ignore a PR label Sep 6, 2023
@BenWhitehead BenWhitehead requested a review from a team as a code owner September 6, 2023 18:06
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: storage Issues related to the googleapis/java-storage API. labels Sep 6, 2023
…port

### Overview

New BlobWriteSessionConfig that will use disk to "Journal" data before transmitting to GCS.

By journaling the data to disk we can rewind to an arbitrary offset in the case of failure, while being optimistic and keeping the stream open to gcs.

General flow of data into this new WritableByteChannel:
1. #write(ByteBuffer)
2. Write contents of ByteBuffer to recovery file on disk
3. force a flush to disk
4. transmit contents of ByteBuffer to GCS leaving stream open
5. return

If the stream to gcs is interrupted with a retriable error
1. query the offset that was successfully committed to GCS
2. rewind our transmit context to the offset
3. Open the recovery file at offset from query
4. stream contents from recovery file to GCS leaving stream open
5. Once recovery file contents are transmitted to GCS return

### Benchmark results

#### Setup
1. c2d-standard-32 in us-east1
    * with 4 local nvme sdd used for the recovery files
    * default NIC, premium network tier
    * debian 11 image
2. regional bucket in us-east1

#### Workload
generate a random file with size `1GiB..2GiB` upload random file to GCS using each of the following configurations:
1. DefaultBlobWriteSessionConfig with 16MiB chunk size
2. DefaultBlobWriteSessionConfig with 64MiB chunk size
3. BufferToDiskThenUpload with the following directories as buffer locations: /mnt/ssd1:/mnt/ssd2:/mnt/ssd3:/mnt/ssd4
4. JournalingBlobWriteSessionConfig with the following directories as journal locations: /mnt/ssd1:/mnt/ssd2:/mnt/ssd3:/mnt/ssd4

Run across `{1,2,4,8,16,32}` concurrent threads to evaluate contention and horizontal scaling.

#### Collected metrics
1. Generate random file
2. record begin instant
3. create new BlobWriteSession
4. BlobWriteSession#open()
5. Copy all bytes from random file to WritableByteChannel from 4
6. close the WritableByteChannel from 4
7. record end instant
8. report `objectSize,UploadStrategy,elapsedTimeUs,Status,ThreadCount`

#### Results summary

Throuhgput in MiB/s, grouped by ThreadCount and UploadStrategy
```
                             count     mean     std     min      50%      75%      90%      99%      max
ThreadCount UploadStrategy
1           PUT 16MiB       4341.0   79.941   8.752  21.599   80.218   85.628   90.710   99.635  106.627
            PUT 64MiB       4341.0  100.410  11.555  20.490  100.022  108.208  115.214  128.251  139.710
            BtDtU           4341.0  104.728  22.527  39.265  110.374  122.335  130.899  146.897  158.975
            journaling      4341.0  125.820  31.332  45.502  133.590  149.027  161.899  188.716  201.938
2           PUT 16MiB       4237.0   80.000   8.525  15.814   80.693   85.651   90.241   97.958  106.677
            PUT 64MiB       4237.0  101.062  11.030  55.813  101.049  108.007  115.114  127.299  135.149
            BtDtU           4237.0  104.236  21.031   5.602  109.382  120.411  128.532  143.113  162.146
            journaling      4237.0  125.010  29.761  43.207  131.827  147.362  159.425  182.441  209.000
4           PUT 16MiB       4411.0   79.708   8.357  40.691   80.600   85.567   89.586   95.533  103.506
            PUT 64MiB       4411.0  100.536   9.947  58.084  100.846  107.209  113.144  122.172  131.974
            BtDtU           4411.0  103.421  21.314  36.401  108.778  119.887  128.550  144.903  158.948
            journaling      4411.0  123.705  30.707  40.082  130.553  147.581  159.995  186.684  222.646
8           PUT 16MiB       4260.0   79.314   8.393   7.148   80.153   85.175   89.319   95.475  100.757
            PUT 64MiB       4260.0   99.913  10.438  60.685  100.450  107.144  112.551  122.409  132.130
            BtDtU           4260.0  102.472  21.228  32.552  108.226  119.072  126.700  142.831  155.628
            journaling      4260.0  122.931  30.261  42.747  130.306  146.098  158.005  184.798  203.696
16          PUT 16MiB       4473.0   77.735   8.091  24.149   78.483   83.123   87.092   95.740  106.176
            PUT 64MiB       4473.0   97.690   9.987  45.342   97.768  103.996  109.807  122.202  140.906
            BtDtU           4473.0   99.314  21.090  39.412  104.270  116.041  124.532  139.305  148.162
            journaling      4473.0  118.956  30.486  44.253  122.585  143.344  156.484  182.211  200.777
32          PUT 16MiB       4024.0   72.923   8.045  20.205   73.601   78.575   82.341   88.970  100.665
            PUT 64MiB       4024.0   93.151  10.030  20.913   93.506   99.748  105.297  116.163  128.284
            BtDtU           4024.0   89.134  18.995  35.633   91.033  103.698  112.994  131.555  146.751
            journaling      4024.0  104.557  30.965  11.785   98.794  129.923  146.747  174.618  200.303
```
@BenWhitehead BenWhitehead force-pushed the write-acceleration/m2/2/journaling-bwsc branch from 2ab83ea to 59da6b7 Compare September 7, 2023 17:29
Base automatically changed from write-acceleration/m2/1/sync-file-channel to main September 12, 2023 17:36
@BenWhitehead BenWhitehead added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2023
@BenWhitehead BenWhitehead merged commit 8880d94 into main Sep 12, 2023
14 checks passed
@BenWhitehead BenWhitehead deleted the write-acceleration/m2/2/journaling-bwsc branch September 12, 2023 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. owlbot:ignore instruct owl-bot to ignore a PR size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants