-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, Spark 3.4: Write properties of PositionDeletesTable
should respect ones of BaseTable
#8428
Conversation
BaseMetadataTable
should respect the properties of BaseTable
I think this option works for me, @aokolnychyi for any concerns? |
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect the properties of BaseTable
BaseMetadataTable
should respect properties of BaseTable
BaseMetadataTable
should respect properties of BaseTable
BaseMetadataTable
should respect properties of BaseTable
@jerqi sorry about this, I was re-thinking about it and not 100% sure it makes sense, as there are some table properties that look weird on all metadata tables. What do you think about first trying to solve it in [SparkBinPackPositionDeletesRewriter] (using ds write options) (https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackPositionDeletesRewriter.java) I think that's a bit less impact. |
We need to set every write option for the data frame. It may be difficult for users to use. |
@jerqi ok that sounds good with me, let's go with that approach. Mostly write properties I assume? Writing position deletes through position_deletes table is not exposed to user, as we dont support that outside rewrite_position_deletes. But I see your point that doing it inside the action makes the code harder as there's quite a few possible properties. |
Maybe we should include read properties and commit properties, too, let me see what properties the rewrite_position_deletes used. |
Yes, you're right. We need mostly write properties. |
BaseMetadataTable
should respect properties of BaseTable
PositionDeletesTable
should respect properties of BaseTable
PositionDeletesTable
should respect properties of BaseTable
PositionDeletesTable
should respect properties of BaseTable
// these properties should respect the ones of BaseTable. | ||
return Collections.unmodifiableMap( | ||
table().properties().entrySet().stream() | ||
.filter(entry -> entry.getKey().startsWith("write.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that all the write properties are needed for our PositionDeletesRewriteAction. So I choose to match the key prefix here instead of copying some specific entries.
PositionDeletesTable
should respect properties of BaseTable
PositionDeletesTable
should respect ones of BaseTable
@szehon-ho Could you review this pr again if you have time? |
Merged, thanks @jerqi . Can you please update the pr description to make it clearer what problem we are fixing? And do we need to make the fix for other Spark versions? |
|
@szehon-ho I have raised a new pr for Spark 3.5. https://github.com/apache/iceberg/pull/8584/files |
What changes were proposed in this pull request?
Make write properties of
PositionDeletesTable
respect ones ofBaseTable
.Why are the changes needed?
When we use
PositionDeletesRewriteAction
, we will use the properties ofPositionDeletesTable
, but the properties are empty before this pr. We will use default properties all the time. The write file format ofPositionDeletesRewriteAction
will be parquet although the table use orc as write format. It's unreasonable. More information you can see #8313 (comment)Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT