-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3396] Make sure BaseFileOnlyViewRelation
only reads projected columns
#4818
Conversation
BaseFileOnlyViewRelation
only reads projected columnsBaseFileOnlyViewRelation
only reads projected columns
37d994c
to
acfa5bc
Compare
@hudi-bot run azure |
acfa5bc
to
4f0f5a2
Compare
4f0f5a2
to
2e19472
Compare
BaseFileOnlyViewRelation
only reads projected columnsBaseFileOnlyViewRelation
only reads projected columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on the patch!
mostly minor comments and some clarifications.
@@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, Boolean rollbackUsingM | |||
.withRollbackUsingMarkers(rollbackUsingMarkers); | |||
} | |||
|
|||
protected Dataset<Row> toDataset(List<HoodieRecord> records) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we take in AvroSchema as an argument. may be create another overloaded method which calls into this w/ some default for avro schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this addressed ?
...k-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyViewRelation.scala
Show resolved
Hide resolved
...k-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyViewRelation.scala
Show resolved
Hide resolved
...k-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyViewRelation.scala
Show resolved
Hide resolved
...ark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieDataSourceHelper.scala
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieUnsafeRDD.scala
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala
Show resolved
Hide resolved
...source/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
Show resolved
Hide resolved
...ource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
Show resolved
Hide resolved
…lemented appropriately
Added java-docs elaborating the scope and purpose
Added java-docs elaborating the scope and purpose
Moving Spark version-specific logic into version-specific impls; Re-aligned components naming
37ce465
to
e3902bf
Compare
@hudi-bot run azure |
...i-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
Outdated
Show resolved
Hide resolved
@@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, Boolean rollbackUsingM | |||
.withRollbackUsingMarkers(rollbackUsingMarkers); | |||
} | |||
|
|||
protected Dataset<Row> toDataset(List<HoodieRecord> records) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this addressed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left couple of comments
@hudi-bot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But I will test this patch out and report the results. since this touches core read path, wanted to ensure things are ok.
… columns (apache#4818) NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow Brief change log - Introduced HoodieBaseRDD as a base for all custom RDD impls - Extracted common fields/methods to HoodieBaseRelation - Cleaned up and streamlined HoodieBaseFileViewOnlyRelation - Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions
… columns (apache#4818) NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant This PR is making sure that BaseFileOnlyViewRelation only reads projected columns as well as avoiding unnecessary serde from Row to InternalRow Brief change log - Introduced HoodieBaseRDD as a base for all custom RDD impls - Extracted common fields/methods to HoodieBaseRelation - Cleaned up and streamlined HoodieBaseFileViewOnlyRelation - Fixed all of the Relations to avoid superfluous Row <> InternalRow conversions
Tips
What is the purpose of the pull request
NOTE: This change is first part of the series to clean up Hudi's Spark DataSource related implementations, making sure there's minimal code duplication among them, implementations are consistent and performant
This PR is making sure that
BaseFileOnlyViewRelation
only reads projected columns as well as avoiding unnecessary serde fromRow
toInternalRow
Brief change log
HoodieBaseRDD
as a base for all customRDD
implsHoodieBaseRelation
HoodieBaseFileViewOnlyRelation
Row
<>InternalRow
conversionsVerify this pull request
This pull request is already covered by existing tests, such as (please describe tests).
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.