Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 tests to the regular integration test suite #6783

Closed
landreev opened this issue Apr 1, 2020 · 14 comments · Fixed by #10044
Closed

Add S3 tests to the regular integration test suite #6783

landreev opened this issue Apr 1, 2020 · 14 comments · Fixed by #10044
Labels
Component: Containers Anything related to cloudy Dataverse, shipped in containers. Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Milestone

Comments

@landreev
Copy link
Contributor

landreev commented Apr 1, 2020

We rely heavily on S3 storage, including in our production use, but there are no regularly-run integration tests for the functionality.
We do already have a RestAssured test class (test/java/edu/harvard/iq/dataverse/api/S3AccessIT.java) with one test, written by @bsilverstein (hi!). We are not regularly running it on jenkins.dataverse.org.
S3 needs to be configured on the test bed installation for the test to work. But since our new jenkins infrastructure already depends on utilizing AWS, this is not an issue. We would just need to modify our scripts to copy the AWS credentials to the target EC2 instance and configure an S3 file store there.

So I propose we do that; plus extend the S3 test class above, for some more extensive testing and to test the functionality that's been added since it was written. (for example, as it is now, the test relies on having S3 configured as the default storage driver. We can instead make it create a new dataverse, set S3 as the storage for that specific dataverse, using the new multistore functionality, and proceed testing it)

S3 testing was specifically requested in #5068. But I'm opening a dedicated issue for it since there were other things being discussed there.

@djbrooke
Copy link
Contributor

djbrooke commented Apr 1, 2020

Thanks @landreev - I'm very interested in us working on this.

@scolapasta I added this to the board in "Needs discussion before ready" feel free to discuss briefly in a tech hour, add an estimate, and move to ready.

@landreev
Copy link
Contributor Author

landreev commented Apr 1, 2020

As an added bonus, we can use this to test the failure case for #6558 (validate physical files on publish): we can create an S3-stored dataset, upload a file, then mess with the file - delete it/overwrite it via the standard S3 API directly so that the md5 we have in the database is no longer valid - then attempt to publish the dataset, and watch Dataverse do its preventive magic.

@poikilotherm
Copy link
Contributor

This might be linked to my current efforts to come up with more integration testing.

For #6694 I'm introducing a new Maven profile und JUnit tests using testcontainers.org.
There is already an upstream module for testing against AWS: https://www.testcontainers.org/modules/localstack

You might be interested in taking a look at develop...poikilotherm:6694-refactor-auth-config#diff-2487ecfd7d3cac06640e742876b1e14e for an example using a Postgres container to do an integration test for entity object marshaling/unmarshaling.

It's fast and doesn't need a full deployment.

@poikilotherm
Copy link
Contributor

poikilotherm commented Apr 3, 2020

And this might be a duplicate of #5068 I opened a while ago 😉

@djbrooke
Copy link
Contributor

djbrooke commented Apr 3, 2020

Thanks @poikilotherm, I'm always interested in closing out duplicate issues. 👍 I'll leave it to someone more technical than me (@landreev @scolapasta ) to check out that issue and to work with you to pull out any details.

@donsizemore
Copy link
Contributor

@pdurbin @poikilotherm @landreev I cobbled together a basic Ansible task to stand up an S3 LocalStack container at https://github.com/IQSS/dataverse-ansible/compare/164_localstack

Corrections, suggestions, etc. welcomed.

@cmbz
Copy link

cmbz commented Jul 25, 2023

  • This issue becomes a spike to analyze current state
  • Follow up with a plan

@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Jul 25, 2023
@donsizemore
Copy link
Contributor

@landreev It looks like LocalStack still exists and can easily be cobbled into Dataverse-Ansible, docker-compose (GitHub actions?), or both. As noted above, @poikilotherm is a big fan of TestContainers though I wonder how many (in)significant differences are present between a LocalStack container and S3 proper.

@pdurbin
Copy link
Member

pdurbin commented Sep 19, 2023

I just check with @poikilotherm and he recommends merging this PR of his...

... before we pick up this issue because he has added a lot of testing goodies there such as Testcontainers that should make working on this issue easier.

@poikilotherm poikilotherm added the Component: Containers Anything related to cloudy Dataverse, shipped in containers. label Sep 19, 2023
@pdurbin
Copy link
Member

pdurbin commented Sep 27, 2023

I just tested at PR above and Testcontainers magic seems to be working! Hopefully off to QA soon.

Also, here are some related issues:

@pdurbin pdurbin self-assigned this Oct 4, 2023
@cmbz cmbz added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) and removed Size: 10 A percentage of a sprint. 7 hours. labels Oct 11, 2023
@pdurbin
Copy link
Member

pdurbin commented Oct 16, 2023

Just a quick note that these days, thanks to all the MPCONFIG efforts by @poikilotherm it's straightforward to get S3AccessIT to pass locally on my laptop by adding the following environment variables to docker-compose-dev.yml:

       - DATAVERSE_AUTH_OIDC_AUTH_SERVER_URL=http://keycloak.mydomain.com:8090/realms/test
       - DATAVERSE_JSF_REFRESH_PERIOD=1
+      - dataverse_files_storage__driver__id=s3
+      - dataverse_files_s3_type=s3
+      - dataverse_files_s3_label=S3
+      - dataverse_files_s3_custom__endpoint__url=s3.us-east-2.amazonaws.com
+      - dataverse_files_s3_custom__endpoint__region=us-east-2
+      - dataverse_files_s3_bucket__name=pdurbin
+      - dataverse_files_s3_upload__redirect=true
+      - dataverse_files_s3_access__key=REDACTED
+      - dataverse_files_s3_secret__key=REDACTED
     ports:
       - "8080:8080" # HTTP (Dataverse Application)

Above I'm using the magic trick described at https://guides.dataverse.org/en/6.0/container/app-image.html#tunables

Here's a screenshot:

Screenshot 2023-10-16 at 10 07 43 AM

As part of this effort, we should probably add first class support for these configuration options, rather than depending on the magic trick.

Finally, as discussed in Slack, in the EC2 instances spun up by Jenkins, we'd like to keep "file" as the default store but add an additional "s3" store. If it's not possible to do all the configuration already, we'll add APIs as needed. For example, we'll need to be able to configure the collection to use the S3 store. @landreev pointed out we can extend the updateAttribute API endpoint for collections.

@poikilotherm
Copy link
Contributor

poikilotherm commented Oct 16, 2023

pdurbin added a commit that referenced this issue Nov 9, 2023
pdurbin added a commit that referenced this issue Nov 14, 2023
Developers can now test S3 locally by using the Dockerized
development environment, which now includes both LocalStack
and MinIO. See S3AccessIT which executes API (end to end) tests.

In addition, a new integration test test class
(not an API test, the new kind launched with `mvn verify`)
has been added at S3AccessIOLocalstackIT. It uses Testcontainers
to spin up Localstack for S3 testing and does not require
Dataverse to be running.

Note that the format of docker-compose-dev.yml had to change to
allow for JVM options to be added.

Finally, docs were improved for listing and setting stores via API.
pdurbin added a commit that referenced this issue Nov 14, 2023
pdurbin added a commit that referenced this issue Nov 20, 2023
Developers can now test S3 locally by using the Dockerized
development environment, which now includes both LocalStack
and MinIO. See S3AccessIT which executes API (end to end) tests.

In addition, a new integration test test class
(not an API test, the new kind launched with `mvn verify`)
has been added at S3AccessIOLocalstackIT. It uses Testcontainers
to spin up Localstack for S3 testing and does not require
Dataverse to be running.

Note that the format of docker-compose-dev.yml had to change to
allow for JVM options to be added.

Finally, docs were improved for listing and setting stores via API.
pdurbin added a commit that referenced this issue Nov 20, 2023
pdurbin added a commit that referenced this issue Dec 5, 2023
pdurbin added a commit that referenced this issue Dec 5, 2023
pdurbin added a commit that referenced this issue Dec 6, 2023
@donsizemore
Copy link
Contributor

Screenshot 2023-12-06 at 10 22 25

pdurbin added a commit that referenced this issue Dec 6, 2023
This should have been part of 811d79a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Containers Anything related to cloudy Dataverse, shipped in containers. Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

6 participants