You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The writer receives checkpoint barrier and flush data and sends WriteResult to the committer operator
The committer operator finalizes all WriteResult received from current checkpoint interval
commit to the table when the checkpoint is completed.
if we fail in step 2 before the finalize process[1], then data written in checkpoint i and checkpoint i+1 will squashed into one commit, this will result in the "loss" of some eq-delete data.
for example,
update for primary key A in checkpoint i this will translate to: eq-delete A + insert A,
update for primary key A in checkpoint i + 1, this will translate to eq-delete A + insert A
after the checkpoint i fails, we'll commit to the table with eq-delete A + insert A + eq-delete A + insert A this second eq-delete A will be "lost" when we read from the table.
Apache Iceberg version
1.4.3
Query engine
Flink
Please describe the bug 🐞
Currently, the Flink sink connector logic is
WriteResult
to the committer operatorWriteResult
received from current checkpoint intervalif we fail in step 2 before the finalize process[1], then data written in checkpoint i and checkpoint i+1 will squashed into one commit, this will result in the "loss" of some eq-delete data.
for example,
update for primary key A in checkpoint i this will translate to:
eq-delete A
+insert A
,update for primary key A in checkpoint i + 1, this will translate to
eq-delete A
+insert A
after the checkpoint i fails, we'll commit to the table with
eq-delete A
+insert A
+eq-delete A
+insert A
this secondeq-delete A
will be "lost" when we read from the table.[1]
iceberg/flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
Line 215 in 9aa354d
Willingness to contribute
The text was updated successfully, but these errors were encountered: