-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: add more S3FileIO tests, cleanup related codebase #1900
Conversation
properties.setS3FileIoMultiPartSize(AwsProperties.S3FILEIO_MULTIPART_SIZE_MIN); | ||
S3FileIO io = new S3FileIO(() -> s3, properties); | ||
PositionOutputStream outputStream = io.newOutputFile(objectUri).create(); | ||
for (int i = 0; i < 100; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I don't think these are actually testing the multipart upload. If we're writing with the OutputStream::write interface, that would only be writing a single byte, so 100 bytes in this case. That wouldn't be enough to trigger the multipart behavior.
I think that's the case for most of the tests I see here. You might want to look at the S3Outputstream test because you can actually validate the operations performed like this: https://github.com/apache/iceberg/blob/master/aws/src/test/java/org/apache/iceberg/aws/s3/S3OutputStreamTest.java#L109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would only be writing a single byte, so 100 bytes in this case
There is an internal loop for (int j = 0; j < AwsProperties.S3FILEIO_MULTIPART_SIZE_MIN; j++)
.
You might want to look at the S3Outputstream test because you can actually validate the operations performed like this
the tests here are trying to verify against actual result in s3 instead of verifying the number of calls, because I know those are verified in the tests you referenced. But I think I am being very not DRY here, let me refactor the tests a little bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just mixed up this test and the next one and missed the inner loop here. However, you might be able to combine some of the upload and content validation into a single test, but it looks like you already have some thoughts on it, so I'll wait.
I guess there's two minor questions I have:
- Is it reasonable to be creating large files in S3 as part of the integration test (I'm not clear on if we run these as part of our actual build or it's left up to users to run in their own accounts).
- Are there cases where we think the s3mock wouldn't catch something that these tests would?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is it reasonable to be creating large files in S3 as part of the integration test (I'm not clear on if we run these as part of our actual build or it's left up to users to run in their own accounts).
I don't expect this to be run for every actual build, and the tests take quite a while to complete, so it's mostly for users to run in their own account. With that being said, I am in progress of potentially getting an account to run these tests for all PRs committing to the aws module with cost covered.
- Are there cases where we think the s3mock wouldn't catch something that these tests would?
It is hard to say how different is the actual S3 compared to S3mock, so this serves as a line of defense to catch potentially different behaviors and potential errors during non-local network calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielcweeks refactored tests, please let me know if it looks good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks good. It seems for #2 there are a number of things we won't be able to test against s3mock (like sts) so it makes sense to add these integration tests once we have an account.
Thanks!
@danielcweeks made the following updates to
S3FileIO
related code:AwsProperties
to be consistent with others, added corresponding testsS3URI#VALID_SCHEMES
in doc ofS3FileIO