Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iqss/6489 - Direct uploads to S3 using presigned URLs #6490

Merged
merged 164 commits into from
Mar 23, 2020

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jan 7, 2020

What this PR does / why we need it: This PR supports direct uploads to S3 using presigned URLs

Which issue(s) this PR closes:

Closes #6489

Special notes for your reviewer: This PR builds on #6488. It could be modified to work independent of multiple stores, but that's not how it was developed and I suspect multiple stores will be merged before this. As of now, this branch is up to date with dev plus #6488.

The initial state of the PR bypasses all ingest processing for direct uploads. I'll be adding that back in. Up to you guys if you want to test prior to that. Also owe some documentation.

Suggestions on how to test this:
Once a store is configured to do direct uploads via a jvm option, the store should work as normal w.r.t. user interface uploads. Testing should probably focus on making sure that's true. Once the PR is updated to support ingest processes, testing could check that the processing is the same between a normal store and a direct upload one.
There will also be a version of the DVUploader that can use the direct upload capability when a store supports it (next PR). That could be used to test the API calls directly.
Performance would be nice to test as well - overall speed and max file size.

Does this PR introduce a user interface change?:
None beyond that in #6488. That said, some elements, like the upload progress bar are being controlled via Javascript rather than by PrimeFaces when direct upload is used, so there could be some unintentional/minor style differences (something else to test...).

Is there a release notes update needed for this change?:
Could be worth mentioning, but everything beyond #6488 is optional, so there are no required upgrade steps.

Additional documentation:

qqmyers and others added 30 commits October 16, 2018 09:21
the JsonObjectBuilders were getting cleared in the build().toString()
used to generate the log message, so the returned value was mostly
empty...
use javascript to scroll within panel after update
Conflicts:
	pom.xml
	src/main/java/Bundle.properties
	src/main/java/edu/harvard/iq/dataverse/DataversePage.java
	src/main/webapp/dataverse.xhtml
	src/main/webapp/resources/images/favicondataverse.png
Conflicts:
	doc/sphinx-guides/source/admin/make-data-count.rst
Conflicts:
	src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
@mheppler mheppler self-assigned this Mar 17, 2020
@qqmyers
Copy link
Member Author

qqmyers commented Mar 19, 2020

I think the change I committed helps with larger numbers of files (I've tested ~100 so far).

I also have a handle on the progress bar flashing - the progress bar code still assumes one file/changes all the bars at once.

I also noticed in testing that drag-n-drop doesn't work - must be a separate event to catch in the js to trigger processing.

So far I haven't been able to reproduce #6 above.

@qqmyers
Copy link
Member Author

qqmyers commented Mar 19, 2020

OK - may be worth testing again. I'm not yet catching any failed http calls but assuming the calls succeed, it should now handle more files, with progress showing for ~4 at a time, no flashing, drag and drop should work, and the long pause with the twirling icon at the end should be gone. Based on debugging with @kcondon , its possible that the root cause for #6 could also be fixed by the code I added a couple days ago to do a retry if Dataverse's first call to get the new file's metadata fails (e.g. some AWS server responds with a 404 before it learns about the new object).

If anything fails without an http error code showing for some call in the browser dev console, let me know (including #6)

@kcondon
Copy link
Contributor

kcondon commented Mar 20, 2020

@qqmyers Had mixed results when adding files to dataset on create with direct to s3:
first time failed to add files, ui froze at progress bar, no progress, error in logs
second time failed to add 3 of 4 files, exception in logs, multiple retries and failure to add file msgs
third time succeeded as expected.

@qqmyers
Copy link
Member Author

qqmyers commented Mar 20, 2020

I just discovered a cut/paste error in the upload js this morning. That update is definitely needed.

@kcondon kcondon assigned kcondon and unassigned mheppler, qqmyers and djbrooke Mar 23, 2020
@kcondon kcondon merged commit 0da6f13 into IQSS:develop Mar 23, 2020
@djbrooke djbrooke added this to the 4.20 milestone Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support larger file uploads to S3 stores via direct upload using presigned URLs
8 participants