Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: add more S3FileIO tests, cleanup related codebase #1900

Merged
merged 2 commits into from
Dec 11, 2020

Conversation

jackye1995
Copy link
Contributor

@danielcweeks made the following updates to S3FileIO related code:

  1. added integration tests that verifies upload behaviors against S3
  2. updated variable names and documentations in AwsProperties to be consistent with others, added corresponding tests
  3. fixed invalid reference to private variable S3URI#VALID_SCHEMES in doc of S3FileIO
  4. muted errorprone warnings:
/iceberg/aws/src/main/java/org/apache/iceberg/aws/s3/S3RequestUtil.java:35: warning: [UnnecessaryLambda] Returning a lambda from a helper method or saving it in a constant is unnecessary; prefer to implement the functional interface method directly and use a method reference instead.
  private static final Function<ServerSideEncryption, S3Request.Builder> NULL_SSE_SETTER = sse -> null;
                                                                         ^
    (see https://errorprone.info/bugpattern/UnnecessaryLambda)
  Did you mean 'private static  S3Request.Builder nullSseSetter(ServerSideEncryption sse){return null;}'?
/iceberg/aws/src/main/java/org/apache/iceberg/aws/s3/S3RequestUtil.java:36: warning: [UnnecessaryLambda] Returning a lambda from a helper method or saving it in a constant is unnecessary; prefer to implement the functional interface method directly and use a method reference instead.
  private static final Function<String, S3Request.Builder> NULL_STRING_SETTER = s -> null;
                                                           ^
    (see https://errorprone.info/bugpattern/UnnecessaryLambda)
  Did you mean 'private static  S3Request.Builder nullStringSetter(String s){return null;}'?
/iceberg/aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java:92: warning: [StaticAssignmentInConstructor] This assignment is to a static field. Mutating static state from a constructor is highly error-prone.
          executorService = MoreExecutors.getExitingExecutorService(
                          ^
    (see https://errorprone.info/bugpattern/StaticAssignmentInConstructor)
/iceberg/aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java:92: warning: [StaticGuardedByInstance] Write to static variable should not be guarded by instance lock 'this'
          executorService = MoreExecutors.getExitingExecutorService(
          ^
    (see https://errorprone.info/bugpattern/StaticGuardedByInstance)
4 warnings

@github-actions github-actions bot added the AWS label Dec 10, 2020
properties.setS3FileIoMultiPartSize(AwsProperties.S3FILEIO_MULTIPART_SIZE_MIN);
S3FileIO io = new S3FileIO(() -> s3, properties);
PositionOutputStream outputStream = io.newOutputFile(objectUri).create();
for (int i = 0; i < 100; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but I don't think these are actually testing the multipart upload. If we're writing with the OutputStream::write interface, that would only be writing a single byte, so 100 bytes in this case. That wouldn't be enough to trigger the multipart behavior.

I think that's the case for most of the tests I see here. You might want to look at the S3Outputstream test because you can actually validate the operations performed like this: https://github.com/apache/iceberg/blob/master/aws/src/test/java/org/apache/iceberg/aws/s3/S3OutputStreamTest.java#L109

Copy link
Contributor Author

@jackye1995 jackye1995 Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would only be writing a single byte, so 100 bytes in this case

There is an internal loop for (int j = 0; j < AwsProperties.S3FILEIO_MULTIPART_SIZE_MIN; j++).

You might want to look at the S3Outputstream test because you can actually validate the operations performed like this

the tests here are trying to verify against actual result in s3 instead of verifying the number of calls, because I know those are verified in the tests you referenced. But I think I am being very not DRY here, let me refactor the tests a little bit

Copy link
Contributor

@danielcweeks danielcweeks Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just mixed up this test and the next one and missed the inner loop here. However, you might be able to combine some of the upload and content validation into a single test, but it looks like you already have some thoughts on it, so I'll wait.

I guess there's two minor questions I have:

  1. Is it reasonable to be creating large files in S3 as part of the integration test (I'm not clear on if we run these as part of our actual build or it's left up to users to run in their own accounts).
  2. Are there cases where we think the s3mock wouldn't catch something that these tests would?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is it reasonable to be creating large files in S3 as part of the integration test (I'm not clear on if we run these as part of our actual build or it's left up to users to run in their own accounts).

I don't expect this to be run for every actual build, and the tests take quite a while to complete, so it's mostly for users to run in their own account. With that being said, I am in progress of potentially getting an account to run these tests for all PRs committing to the aws module with cost covered.

  1. Are there cases where we think the s3mock wouldn't catch something that these tests would?

It is hard to say how different is the actual S3 compared to S3mock, so this serves as a line of defense to catch potentially different behaviors and potential errors during non-local network calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks refactored tests, please let me know if it looks good to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks good. It seems for #2 there are a number of things we won't be able to test against s3mock (like sts) so it makes sense to add these integration tests once we have an account.

Thanks!

@danielcweeks danielcweeks merged commit d48c7a3 into apache:master Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants