Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve throughput of http based storage#reader between 100 MiB/s and 200 MiB/s #1799

Merged
merged 7 commits into from
Dec 20, 2022

Commits on Dec 16, 2022

  1. feat: improve throughput of http based storage#reader between 100MiB/…

    …s and 200 MiB/s
    
    ### Work
    Implement new BlobReadChannelV2 which replaces BlobReadChannel and improves on
    its resource usage to reduce min number of RPCs to 1 from (objSize / chunkSize + 1)
    while still maintaining the ability to restart a stream that may have been
    interrupted.
    
    ### Results
    Throughput in MiB/s has increased across the board:
    ```
            ClassName           mean    25%    50%    75%    90%    95%    99%    max
    READ[0] BlobReadChannel     32.2   25.3   29.0   32.6   42.1   56.1  111.9  214.1
    READ[1] BlobReadChannel     32.1   25.4   28.7   32.6   41.7   55.4  106.1  224.4
    READ[2] BlobReadChannel     31.9   25.2   28.6   32.8   41.6   55.2  105.4  227.2
    READ[0] BlobReadChannelV2  214.1  196.4  219.8  239.3  254.3  261.2  278.0  315.2
    READ[1] BlobReadChannelV2  215.9  198.8  221.0  240.0  254.4  261.8  281.8  315.6
    READ[2] BlobReadChannelV2  216.4  199.5  221.2  239.4  253.9  261.6  281.6  308.6
    ```
    
    Data collected using all default settings, against a regional bucket,
    across a range of object sizes [256KiB, 2GiB]. Each object is read in full three
    times to account for any GCS caching variability.
    
    ### Internal implementation notes
    Add ByteRangeSpec to encapsulate relative vs explicit(open) vs explicit(closed)
    vs null vs open-ended ranges and their associated logical subtleties.
    
    New StorageReadChannel interface possible candidate for new storage specific
    interface we can expose to folks for improvements independent of core and
    BigQuery.
    BenWhitehead committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    baaae2d View commit details
    Browse the repository at this point in the history
  2. chore: ClientStuff -> BlobReadChannelContext

    ClientStuff is no longer Serializable. Instead, BlobReadChannelV2State will store the instance of HttpStorageOptions and then reconstitute the BlobReadChannelContext at restore() time.
    BenWhitehead committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    c6fce61 View commit details
    Browse the repository at this point in the history
  3. chore: gut BlobReadChannel

    In order to facilitate migrating any RestorableState<ReadChannel> customers might have, we leave the existing class hierarchy in place and update BlobReadChannel.StateImpl#restore() to produce a new BlobReadChannelV2.
    
    In the next major version this compatibility path will be removed.
    
    To test compatibility a serialized instance of a BlobReadChannel from v2.16.0 has been created and serialized into blobWriteChannel.ser.properties along with a comment describing how the ser was generated.
    BenWhitehead committed Dec 16, 2022
    Configuration menu
    Copy the full SHA
    cea4cdc View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2022

  1. chore: review cleanup

    BenWhitehead committed Dec 19, 2022
    Configuration menu
    Copy the full SHA
    1d693e7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9835691 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    78c5d1c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6207cf1 View commit details
    Browse the repository at this point in the history