Skip to content

v3.5 Multipart Upload

Andrey Kurilov edited this page Sep 22, 2017 · 4 revisions

Overview

Some cloud storages support the concatenation of the data object parts uploaded independently.

Limitations

  1. The storage API supporting the MPU is used. These are s3, emcs3 and swift currently.
  2. In the distributed mode, all object parts are processed by the single storage driver.
  3. "Create" load type is used to split the large objects into the parts uploaded separately.
  4. It's strongly recommended to set "load-batch-size" configuration parameter to "1".

Approach

Mongoose has the so called I/O task abstraction. I/O tasks are executed by the specific storage drivers. The storage driver may be able to detect the "multipart" I/O tasks and execute the corresponding sequence of the "sub-tasks":

  1. Initiate the object MPU.
  2. Upload the object parts.
  3. Commit the object MPU.

Configuration

The "item-data-ranges-threshold" configuration parameter controls the MPU behavior. The value is the size in bytes. Any new generated object is treated as "large" if its size is more or equal than the configured threshold.

Reporting

Parts List Output

If multipart upload task is finished the record containing the object name and the corresponding upload id is written to the parts.upload.csv file. The upload completion response latency is also persisted in the 3rd column.

Future Enhancements

  • Support Read for the segmented objects
  • Support Update for the segmented objects
  • Support Copy for the segmented objects
Clone this wiki locally