Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9002 allow direct upload setting #9003

Merged
merged 44 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
5b34065
added api-direct-upload option for storage configurations
ErykKul Sep 29, 2022
5db560e
improvements in the documentation
ErykKul Sep 29, 2022
e4ee0d3
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Nov 7, 2022
cbc42d5
renamed and moved the direct upload JVM option in the documentation
ErykKul Nov 8, 2022
187cc61
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Nov 15, 2022
4abac1a
revert by accident editted old release notes
ErykKul Nov 17, 2022
578c7af
indentation fixes
ErykKul Nov 17, 2022
8578de1
tab character removed
ErykKul Nov 17, 2022
f2e75db
tab character removed
ErykKul Nov 17, 2022
bff889d
tab character removed
ErykKul Nov 17, 2022
ad4bb51
renamed jvm option: allow-out-of-band-upload -> upload-out-of-band
ErykKul Nov 17, 2022
49102ad
linking to api documentation
ErykKul Nov 17, 2022
e9d6df0
some improvements in the documentation
ErykKul Nov 17, 2022
dc64aa2
documentation improvements by Dieuwertje
ErykKul Nov 17, 2022
085fb8f
improvements in the documentation
ErykKul Nov 21, 2022
c65cd7e
merged develop branch
ErykKul Apr 21, 2023
098de49
reverted SystemConfig.java changes
ErykKul Apr 21, 2023
c69ce63
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Apr 21, 2023
6d9acdb
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 8, 2023
e048dad
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 8, 2023
35436c5
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 12, 2023
94913cc
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 17, 2023
ce7ba1b
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 22, 2023
fddca51
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul May 26, 2023
0a16342
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 9, 2023
6da584f
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 15, 2023
1060061
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Jun 16, 2023
03b1295
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 1, 2023
584bed4
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 6, 2023
f61da49
typo fix
ErykKul Sep 6, 2023
884d046
edited out-of-band option description in config doc
ErykKul Sep 6, 2023
7bb4231
upload-redirect can only be true for s3 driver
ErykKul Sep 6, 2023
e6b7d9f
removed confusing comments about a possible error
ErykKul Sep 6, 2023
5ca942f
removed section that was re-added in a merge, after it was removed in…
ErykKul Sep 6, 2023
f5fba58
simplified config documentation for the out-of-band option
ErykKul Sep 6, 2023
43d7419
added references to the out-of-band option configuration
ErykKul Sep 6, 2023
f6bdd2a
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 18, 2023
bf42a92
reverted out-of-band setting in S3AccessIO
ErykKul Sep 18, 2023
d8ea581
add-file-metadata-api made literal as the ref does not exist
ErykKul Sep 18, 2023
7cd11f3
Update config.rst
qqmyers Sep 18, 2023
81be260
typo
qqmyers Sep 18, 2023
06b7255
Merge branch 'IQSS:develop' into 9002_allow_direct_upload_setting
ErykKul Sep 27, 2023
16322df
Create 9002_allow_direct_upload_setting.md
kcondon Sep 27, 2023
3ecd118
Update doc/release-notes/9002_allow_direct_upload_setting.md
kcondon Sep 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions doc/release-notes/4.20-release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,16 @@ Also note that the :MaxFileUploadSizeInBytes property has a new option to provid

### Direct S3 Upload Changes

Direct upload to S3 is enabled per store by one new jvm option:
Direct upload to S3 in UI and API is enabled per store by one new jvm option:

./asadmin create-jvm-options "\-Ddataverse.files.<id>.upload-redirect=true"


This option makes direct upload the default in the UI. In the API, you can use either: direct upload or upload via Dataverse upload. Direct upload to S3 in API only is enabled per store by this new jvm option:

./asadmin create-jvm-options "\-Ddataverse.files.<id>.api-direct-upload=true"

That option leaves via Dataverse upload by default in UI, but makes both: uploads via Dataverse and direct uploads possible via API.

The existing :MaxFileUploadSizeInBytes property and ```dataverse.files.<id>.url-expiration-minutes``` jvm option for the same store also apply to direct upload.

Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files.<id>.ingestsizelimit jvm option.
Expand Down Expand Up @@ -126,7 +132,8 @@ We made changes to the JSON Export in this release (Issue 6650, PR #6669). If yo
## New JVM Options for file storage drivers

- The JVM option dataverse.files.file.directory=<your directory> controls where temporary files are stored (in the /temp subdir of the defined directory), independent of the location of any 'file' store defined above.
- The JVM option dataverse.files.<id>.upload-redirect enables direct upload of files added to a dataset to the S3 bucket. (S3 stores only!)
- The JVM option dataverse.files.<id>.upload-redirect enables direct upload of files added to a dataset in the S3 bucket. (S3 stores only!)
ErykKul marked this conversation as resolved.
Show resolved Hide resolved
- The JVM option dataverse.files.<id>.api-direct-upload enables direct upload of files added to a dataset in any storage. (Via API only and when the uploading tool has direct access to the relevant storage used; i.e., upload the file first and register it via API!)
- The JVM option dataverse.files.<id>.MaxFileUploadSizeInBytes controls the maximum size of file uploads allowed for the given file store.
- The JVM option dataverse.files.<id>.ingestsizelimit controls the maximum size of files for which ingest will be attempted, for the given file store.

Expand Down
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,8 @@ List of S3 Storage Options
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset in the S3 store. ``false``
dataverse.files.<id>.api-direct-upload ``true``/``false`` Enable direct upload of files added to a dataset via API only. ``false``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left notes in the issue discussing whether this might be better as an allow-out-of-band option that could be moved outside of the S3 section. In that case it might be restricted to superuser if desired. In any case, it would work as in this PR w.r.t. having upload-redirect on supercedes it (you need to be able to call the /addFiles endpoint for directupload so it doesn't make sense to have upload-redirect=true and api-direct-upload (or allow-out-of-band-upload)=false.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed the option to allow-out-of-band-upload. I have also removed it from the s3 section and added this documentation in the general section:

When using integration tools, dataverse installation can be configured to allow out of band upload by setting the dataverse.files.\<id\>.allow-out-of-band-upload JVM option to true.
Files can be then uploaded by an integration tool with datasets/{id}/add api call, or uploaded directly to the storage and registerd in a dataset afterwards using the datasets/{id}/addFiles api call.
Notice that using S3-storage with dataverse.files.\<id\>.upload-redirect JVM option enabled supersedes the allow-out-of-band-upload and will enable direct upload even with allow-out-of-band-upload not set (or set to false).
In other words, dataverse.files.\<id\>.allow-out-of-band-upload option opens the datasets/{id}/add and datasets/{id}/addFiles api endpoints without redirecting uploads in the UI.
Enabling the upload-redirect option allows then direct upload automatically, without the need of enabling the allow-out-of-band-upload (setting it to false does not have any effect in that case).

dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -605,7 +605,8 @@ public static String getDriverPrefix(String driverId) {
}

public static boolean isDirectUploadEnabled(String driverId) {
return Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-redirect"));
return Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-redirect")) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since direct-upload only works with S3 stores, should the base class just check out-of-band and the S3 store override this method to check both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little bit confusing. It is a static method. Refactoring the code does not seem to be the best solution to me. For now, I have changed the code to this (which seems equivalent to me):

    public static boolean isDirectUploadEnabled(String driverId) {
        return (DataAccess.S3.equals(driverId) && Boolean.parseBoolean(System.getProperty("dataverse.files." + DataAccess.S3 + ".upload-redirect"))) ||
            Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-out-of-band"));
    }

Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".api-direct-upload"));
}

//Check that storageIdentifier is consistent with store's config
Expand Down
3 changes: 2 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/util/FileUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -1748,7 +1748,8 @@ public static boolean isPackageFile(DataFile dataFile) {

public static S3AccessIO getS3AccessForDirectUpload(Dataset dataset) {
String driverId = dataset.getEffectiveStorageDriverId();
boolean directEnabled = Boolean.getBoolean("dataverse.files." + driverId + ".upload-redirect");
boolean directEnabled = Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-redirect")) ||
ErykKul marked this conversation as resolved.
Show resolved Hide resolved
Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".api-direct-upload"));
//Should only be requested when it is allowed, but we'll log a warning otherwise
if(!directEnabled) {
logger.warning("Direct upload not supported for files in this dataset: " + dataset.getId());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1112,7 +1112,8 @@ public boolean isDatafileValidationOnPublishEnabled() {
}

public boolean directUploadEnabled(DvObjectContainer container) {
return Boolean.getBoolean("dataverse.files." + container.getEffectiveStorageDriverId() + ".upload-redirect");
// this method is used in UI only, therfore "dataverse.files." + driverId + ".api-direct-upload" is not used here
return Boolean.parseBoolean(System.getProperty("dataverse.files." + container.getEffectiveStorageDriverId() + ".upload-redirect"));
}

public String getDataCiteRestApiUrlString() {
Expand Down