Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9002 allow direct upload setting #9003

Merged
merged 44 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
5b34065
added api-direct-upload option for storage configurations
ErykKul Sep 29, 2022
5db560e
improvements in the documentation
ErykKul Sep 29, 2022
e4ee0d3
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Nov 7, 2022
cbc42d5
renamed and moved the direct upload JVM option in the documentation
ErykKul Nov 8, 2022
187cc61
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Nov 15, 2022
4abac1a
revert by accident editted old release notes
ErykKul Nov 17, 2022
578c7af
indentation fixes
ErykKul Nov 17, 2022
8578de1
tab character removed
ErykKul Nov 17, 2022
f2e75db
tab character removed
ErykKul Nov 17, 2022
bff889d
tab character removed
ErykKul Nov 17, 2022
ad4bb51
renamed jvm option: allow-out-of-band-upload -> upload-out-of-band
ErykKul Nov 17, 2022
49102ad
linking to api documentation
ErykKul Nov 17, 2022
e9d6df0
some improvements in the documentation
ErykKul Nov 17, 2022
dc64aa2
documentation improvements by Dieuwertje
ErykKul Nov 17, 2022
085fb8f
improvements in the documentation
ErykKul Nov 21, 2022
c65cd7e
merged develop branch
ErykKul Apr 21, 2023
098de49
reverted SystemConfig.java changes
ErykKul Apr 21, 2023
c69ce63
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Apr 21, 2023
6d9acdb
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 8, 2023
e048dad
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 8, 2023
35436c5
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 12, 2023
94913cc
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 17, 2023
ce7ba1b
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 22, 2023
fddca51
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 26, 2023
0a16342
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 9, 2023
6da584f
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 15, 2023
1060061
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 16, 2023
03b1295
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 1, 2023
584bed4
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 6, 2023
f61da49
typo fix
ErykKul Sep 6, 2023
884d046
edited out-of-band option description in config doc
ErykKul Sep 6, 2023
7bb4231
upload-redirect can only be true for s3 driver
ErykKul Sep 6, 2023
e6b7d9f
removed confusing comments about a possible error
ErykKul Sep 6, 2023
5ca942f
removed section that was re-added in a merge, after it was removed in…
ErykKul Sep 6, 2023
f5fba58
simplified config documentation for the out-of-band option
ErykKul Sep 6, 2023
43d7419
added references to the out-of-band option configuration
ErykKul Sep 6, 2023
f6bdd2a
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 18, 2023
bf42a92
reverted out-of-band setting in S3AccessIO
ErykKul Sep 18, 2023
d8ea581
add-file-metadata-api made literal as the ref does not exist
ErykKul Sep 18, 2023
7cd11f3
Update config.rst
qqmyers Sep 18, 2023
81be260
typo
qqmyers Sep 18, 2023
06b7255
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 27, 2023
16322df
Create 9002_allow_direct_upload_setting.md
kcondon Sep 27, 2023
3ecd118
Update doc/release-notes/9002_allow_direct_upload_setting.md
kcondon Sep 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data

curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"

Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

To add multiple Uploaded Files to the Dataset
Expand Down Expand Up @@ -146,7 +146,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data

curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"

Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.


Expand Down Expand Up @@ -176,7 +176,7 @@ Note that the API call does not validate that the file matches the hash value su

curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"

Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

Replacing multiple existing files in the Dataset
Expand Down Expand Up @@ -274,5 +274,5 @@ The JSON object returned as a response from this API call includes a "data" that
}


Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
47 changes: 26 additions & 21 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,10 @@ A Dataverse installation can alternately store files in a Swift or S3-compatible

A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web-accessible trusted remote store.

A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\<id\>.upload-out-of-band`` JVM option to ``true``.
By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset <direct-add-to-dataset-api>` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.

The following sections describe how to set up various types of stores and how to configure for multiple stores.

Multi-store Basics
Expand Down Expand Up @@ -800,27 +804,28 @@ List of S3 Storage Options
.. table::
:align: left

=========================================== ================== ========================================================================== =============
JVM Option Value Description Default value
=========================================== ================== ========================================================================== =============
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
dataverse.files.<id>.type ``s3`` **Required** to mark this storage as S3 based. (none)
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
=========================================== ================== ========================================================================== =============
=========================================== ================== =================================================================================== =============
JVM Option Value Description Default value
=========================================== ================== =================================================================================== =============
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
dataverse.files.<id>.type ``s3`` **Required** to mark this storage as S3 based. (none)
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset in the S3 store. ``false``
dataverse.files.<id>.upload-out-of-band ``true``/``false`` Allow upload of files by out-of-band methods (using some tool other than Dataverse) ``false``
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
=========================================== ================== =================================================================================== =============

.. table::
:align: left
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -606,7 +606,8 @@ public static String getDriverPrefix(String driverId) {
}

public static boolean isDirectUploadEnabled(String driverId) {
return Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-redirect"));
return (DataAccess.S3.equals(driverId) && Boolean.parseBoolean(System.getProperty("dataverse.files." + DataAccess.S3 + ".upload-redirect"))) ||
Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-out-of-band"));
}

//Check that storageIdentifier is consistent with store's config
Expand Down
Loading