Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add writer for unordered position deletes #7692

Merged
merged 2 commits into from
May 25, 2023

Conversation

aokolnychyi
Copy link
Contributor

This PR adds a position delete writer that can handle unordered position deletes. This writer should allow us to avoid a local sort for some MERGE operations. Specifically, consider MERGE operations where 90% of data are inserts and the table is partitioned but no sort order is defined. Right now, we always request a local sort to order deletes. However, that sort can be useless for inserts if no sort order is defined and fanout writer is enabled. Moreover, ordering inserts may lead to a spill, which is expensive for wide tables and large tasks.


@Benchmark
@Threads(1)
public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expect 5-15% overhead for the new buffering writer, which can still be beneficial for the job if we skip local ordering for inserts and potentially avoid spilling. This benchmark also does not take into account the cost to order records, it only tests the write performance. We will use this writer only if fanout is enabled. We should also explore Puffin delete files that would persist bitmaps directly.

Benchmark                                                                                                      Mode  Cnt           Score            Error   Units
ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter                                          ss    5           6.004 ±          0.185    s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter                                             ss    5           6.503 ±          0.171    s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled                                     ss    5           6.616 ±          0.204    s/op

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also explore Puffin delete files that would persist bitmaps directly

+1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?

Copy link
Contributor Author

@aokolnychyi aokolnychyi May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this benchmark (100 data files, 50k deletes each, 5 million deletes total) with a GC profiler and did not see anything bad. Issues will arise when there are lots of unique data files. That's unlikely as we distribute by partition and this writer will still be disabled by default, so users will have to opt in explicitly. It isn't perfect for sure but there would be reasonable cases for it.

@aokolnychyi
Copy link
Contributor Author

cc @singhpk234 @amogh-jahagirdar @RussellSpitzer @szehon-ho @flyrain @rdblue

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @aokolnychyi !

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too, left few comments

@@ -38,6 +38,13 @@ public PositionDelete<R> set(CharSequence newPath, long newPos, R newRow) {
return this;
}

public PositionDelete<R> set(CharSequence newPath, long newPos) {
this.path = newPath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit question: is it cleaner to have this constructor delegate to the other one?

import org.roaringbitmap.longlong.Roaring64Bitmap;

/**
* A position delete writer that is capable of handling unordered deletes without rows.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we add javadoc to the PositionDeleteWriter, when we get a chance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add in this PR.


@Benchmark
@Threads(1)
public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?

@@ -93,14 +100,33 @@ public void setupBenchmark() {
row -> transform.bind(Types.IntegerType.get()).apply(row.getInt(1))));
this.rows = data;

this.positionDeleteRows =
RandomData.generateSpark(DeleteSchemaUtil.pathPosSchema(), NUM_ROWS, 0L);
this.positionDeleteRows = generatePositionDeletes(false /* shuffle */);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't shuffle?


for (int pathIndex = 0; pathIndex < NUM_DATA_FILES_PER_POSITION_DELETE_FILE; pathIndex++) {
UTF8String path = UTF8String.fromString("path/to/position/delete/file/" + UUID.randomUUID());
int step = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just make this outside?

@aokolnychyi aokolnychyi merged commit 5eb4511 into apache:master May 25, 2023
@aokolnychyi
Copy link
Contributor Author

Thanks, @singhpk234 @szehon-ho!

rodmeneses pushed a commit to rodmeneses/iceberg that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants