-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink: switch to use SortKey for data statistics #9212
Conversation
...k/src/test/java/org/apache/iceberg/flink/source/TestIcebergSourceWithWatermarkExtractor.java
Outdated
Show resolved
Hide resolved
import org.junit.After; | ||
import org.junit.Before; | ||
import org.junit.Test; | ||
|
||
public class TestDataStatisticsOperator { | ||
private final Schema schema = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it worth to have an e2e test where we have a wider record and a narrower SortKey
?
fa153d6
to
e594f9e
Compare
f2c7246
to
b13439b
Compare
operator.processElement(new StreamRecord<>(GenericRowData.of(StringData.fromString("a")))); | ||
operator.processElement(new StreamRecord<>(GenericRowData.of(StringData.fromString("b")))); | ||
operator.processElement(new StreamRecord<>(genericRowDataA)); | ||
operator.processElement(new StreamRecord<>(genericRowDataA)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we make something like this here:
genericRowDataA_2 = GenericRowData.of(StringData.fromString("a"), 2);
operator.processElement(new StreamRecord<>(genericRowDataA_2));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean not reusing the object? if yes, I would just get rid of the pre-constructed objects and always construct them on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - my midnight review was not clear enough. 😢
I was trying to suggest to process a different RowData
object, with the same key, like:
private final GenericRowData genericRowDataA_1 = GenericRowData.of(StringData.fromString("a"), 1);
private final GenericRowData genericRowDataA_2 = GenericRowData.of(StringData.fromString("a"), 2);
private final GenericRowData genericRowDataB = GenericRowData.of(StringData.fromString("b"), 3);
[..]
operator.processElement(new StreamRecord<>(genericRowDataA_1));
operator.processElement(new StreamRecord<>(genericRowDataA_2));
operator.processElement(new StreamRecord<>(genericRowDataB));
I know that we have individual test for the correct grouping, but I consider this as an e2e tests, and it would be nice to test this out as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. for readability,I would just get rid of the pre-constructed objects and always construct them on the fly then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, but this is really nice work.
Thanks, Steven!
thanks @pvary for the review |
…ata statistics
…ata statistics
@Override | ||
public void writeSnapshot(DataOutputView out) throws IOException { | ||
Preconditions.checkState(schema != null, "Invalid schema: null"); | ||
Preconditions.checkState(sortOrder != null, "Invalid sort order: null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Preconditions.checkNotNull
to check if schema and sortOrder are null or not
Same for line 329, 330
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer to avoid chechNotNull
, as the result is a NullPointerException
. Generally NullPointerExceptions
are something very unexpected, while IllegalArgumentExceptions
are a different category for me (configuration errors, and such)
That is why, I did not flag this at the review, but - taking a second look - this could be strange for someone less biased
…ata statistics
…ata statistics
…ata statistics
…ata statistics
No description provided.