-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Add writer for unordered position deletes #7692
Core: Add writer for unordered position deletes #7692
Conversation
|
||
@Benchmark | ||
@Threads(1) | ||
public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expect 5-15% overhead for the new buffering writer, which can still be beneficial for the job if we skip local ordering for inserts and potentially avoid spilling. This benchmark also does not take into account the cost to order records, it only tests the write performance. We will use this writer only if fanout is enabled. We should also explore Puffin delete files that would persist bitmaps directly.
Benchmark Mode Cnt Score Error Units
ParquetWritersBenchmark.writeUnpartitionedClusteredPositionDeleteWriter ss 5 6.004 ± 0.185 s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriter ss 5 6.503 ± 0.171 s/op
ParquetWritersBenchmark.writeUnpartitionedFanoutPositionDeleteWriterShuffled ss 5 6.616 ± 0.204 s/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also explore Puffin delete files that would persist bitmaps directly
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this benchmark (100 data files, 50k deletes each, 5 million deletes total) with a GC profiler and did not see anything bad. Issues will arise when there are lots of unique data files. That's unlikely as we distribute by partition and this writer will still be disabled by default, so users will have to opt in explicitly. It isn't perfect for sure but there would be reasonable cases for it.
903e827
to
20ebae5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @aokolnychyi !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too, left few comments
@@ -38,6 +38,13 @@ public PositionDelete<R> set(CharSequence newPath, long newPos, R newRow) { | |||
return this; | |||
} | |||
|
|||
public PositionDelete<R> set(CharSequence newPath, long newPos) { | |||
this.path = newPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit question: is it cleaner to have this constructor delegate to the other one?
import org.roaringbitmap.longlong.Roaring64Bitmap; | ||
|
||
/** | ||
* A position delete writer that is capable of handling unordered deletes without rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we add javadoc to the PositionDeleteWriter, when we get a chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add in this PR.
|
||
@Benchmark | ||
@Threads(1) | ||
public void writeUnpartitionedFanoutPositionDeleteWriterShuffled(Blackhole blackhole) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About memory overhead (not sure any thing measures it now): Should be just additional space of map (data_file_path => bitmaps)? Will there be cases, esp in fanout, where a writer writes many delete of many data files, that will start to stress it?
@@ -93,14 +100,33 @@ public void setupBenchmark() { | |||
row -> transform.bind(Types.IntegerType.get()).apply(row.getInt(1)))); | |||
this.rows = data; | |||
|
|||
this.positionDeleteRows = | |||
RandomData.generateSpark(DeleteSchemaUtil.pathPosSchema(), NUM_ROWS, 0L); | |||
this.positionDeleteRows = generatePositionDeletes(false /* shuffle */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't shuffle?
|
||
for (int pathIndex = 0; pathIndex < NUM_DATA_FILES_PER_POSITION_DELETE_FILE; pathIndex++) { | ||
UTF8String path = UTF8String.fromString("path/to/position/delete/file/" + UUID.randomUUID()); | ||
int step = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just make this outside?
Thanks, @singhpk234 @szehon-ho! |
This PR adds a position delete writer that can handle unordered position deletes. This writer should allow us to avoid a local sort for some MERGE operations. Specifically, consider MERGE operations where 90% of data are inserts and the table is partitioned but no sort order is defined. Right now, we always request a local sort to order deletes. However, that sort can be useless for inserts if no sort order is defined and fanout writer is enabled. Moreover, ordering inserts may lead to a spill, which is expensive for wide tables and large tasks.