Core: Add writer for unordered position deletes #7692

aokolnychyi · 2023-05-24T01:05:21Z

This PR adds a position delete writer that can handle unordered position deletes. This writer should allow us to avoid a local sort for some MERGE operations. Specifically, consider MERGE operations where 90% of data are inserts and the table is partitioned but no sort order is defined. Right now, we always request a local sort to order deletes. However, that sort can be useless for inserts if no sort order is defined and fanout writer is enabled. Moreover, ordering inserts may lead to a spill, which is expensive for wide tables and large tasks.

aokolnychyi · 2023-05-24T01:10:39Z

spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java

+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole)


We should expect 5-15% overhead for the new buffering writer, which can still be beneficial for the job if we skip local ordering for inserts and potentially avoid spilling. This benchmark also does not take into account the cost to order records, it only tests the write performance. We will use this writer only if fanout is enabled. We should also explore Puffin delete files that would persist bitmaps directly.

Benchmark Mode Cnt Score Error Units ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter ss 5 6.004 ± 0.185 s/op ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter ss 5 6.503 ± 0.171 s/op ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled ss 5 6.616 ± 0.204 s/op

We should also explore Puffin delete files that would persist bitmaps directly

+1

About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?

I ran this benchmark (100 data files, 50k deletes each, 5 million deletes total) with a GC profiler and did not see anything bad. Issues will arise when there are lots of unique data files. That's unlikely as we distribute by partition and this writer will still be disabled by default, so users will have to opt in explicitly. It isn't perfect for sure but there would be reasonable cases for it.

aokolnychyi · 2023-05-24T01:11:26Z

cc @singhpk234 @amogh-jahagirdar @RussellSpitzer @szehon-ho @flyrain @rdblue

singhpk234

LGTM, Thanks @aokolnychyi !

szehon-ho

Looks good to me too, left few comments

szehon-ho · 2023-05-24T22:01:42Z

core/src/main/java/org/apache/iceberg/deletes/PositionDelete.java

@@ -38,6 +38,13 @@ public PositionDelete<R> set(CharSequence newPath, long newPos, R newRow) {
    return this;
  }

+  public PositionDelete<R> set(CharSequence newPath, long newPos) {
+    this.path = newPath;


Nit question: is it cleaner to have this constructor delegate to the other one?

szehon-ho · 2023-05-24T22:02:41Z

core/src/main/java/org/apache/iceberg/deletes/SortingPositionOnlyDeleteWriter.java

+import org.roaringbitmap.longlong.Roaring64Bitmap;
+
+/**
+ * A position delete writer that is capable of handling unordered deletes without rows.


Nit: can we add javadoc to the PositionDeleteWriter, when we get a chance?

Will add in this PR.

szehon-ho · 2023-05-24T22:12:47Z

spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java

+
+  @Benchmark
+  @Threads(1)
+  public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole)


About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?

szehon-ho · 2023-05-24T22:13:07Z

spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java

@@ -93,14 +100,33 @@ public void setupBenchmark() {
            row -> transform.bind(Types.IntegerType.get()).apply(row.getInt(1))));
    this.rows = data;

-    this.positionDeleteRows =
-        RandomData.generateSpark(DeleteSchemaUtil.pathPosSchema(), NUM_ROWS, 0L);
+    this.positionDeleteRows = generatePositionDeletes(false /* shuffle */);


Don't shuffle?

szehon-ho · 2023-05-24T22:13:22Z

spark/v3.4/spark/src/jmh/java/org/apache/iceberg/spark/source/WritersBenchmark.java

+
+    for (int pathIndex = 0; pathIndex < NUM_DATA_FILES_PER_POSITION_DELETE_FILE; pathIndex++) {
+      UTF8String path = UTF8String.fromString("path/to/position/delete/file/" + UUID.randomUUID());
+      int step = 10;


why not just make this outside?

aokolnychyi · 2023-05-25T15:06:00Z

Thanks, @singhpk234 @szehon-ho!

github-actions bot added core data spark labels May 24, 2023

aokolnychyi commented May 24, 2023

View reviewed changes

Core: Add writer for unordered position deletes

20ebae5

aokolnychyi force-pushed the fanout-delete-only-writer branch from 903e827 to 20ebae5 Compare May 24, 2023 01:35

aokolnychyi closed this May 24, 2023

aokolnychyi reopened this May 24, 2023

singhpk234 approved these changes May 24, 2023

View reviewed changes

szehon-ho reviewed May 24, 2023

View reviewed changes

Review

933604c

aokolnychyi merged commit 5eb4511 into apache:master May 25, 2023

rodmeneses pushed a commit to rodmeneses/iceberg that referenced this pull request Feb 19, 2024

Core: Add writer for unordered position deletes (apache#7692)

83f2bb6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Add writer for unordered position deletes #7692

Core: Add writer for unordered position deletes #7692

aokolnychyi commented May 24, 2023

aokolnychyi May 24, 2023

singhpk234 May 24, 2023

szehon-ho May 24, 2023

aokolnychyi May 24, 2023 •

edited

Loading

aokolnychyi commented May 24, 2023

singhpk234 left a comment

szehon-ho left a comment

szehon-ho May 24, 2023

szehon-ho May 24, 2023

aokolnychyi May 24, 2023

szehon-ho May 24, 2023

szehon-ho May 24, 2023

szehon-ho May 24, 2023

aokolnychyi commented May 25, 2023

Core: Add writer for unordered position deletes #7692

Core: Add writer for unordered position deletes #7692

Conversation

aokolnychyi commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi May 24, 2023 • edited Loading

Choose a reason for hiding this comment

aokolnychyi commented May 24, 2023

singhpk234 left a comment

Choose a reason for hiding this comment

szehon-ho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi commented May 25, 2023

aokolnychyi May 24, 2023 •

edited

Loading