[SUPPORT] Performance degradation for listing partitions #5776

namuny · 2022-06-07T01:16:19Z

I'm noticing a steep increase in duration for listing partitions during clustering (and potentially for other operations such as clean, haven't tested this yet), specifically after this PR was merged. I'm yet to get to the bottom of exactly why, but reverting the implementation of FileSystemBackedTableMetadata.getAllPartitionPaths to 0.9.0's implementation gives me a performance boost.

Test results:

0.9.0 approach (but using 0.11.0 for everything else) - 50 seconds to list partitions
Pure 0.11.0 approach - over 20 minutes to list partitions

My setup:

Hudi 0.11.0
CoW + inline clustering
Metadata table is disabled
Test results above is with 10,000 partitions, using S3.

Regardless of why the metadata is disabled, I'm curious to understand why the partition listing time for 10,000 partitions goes from sub minute to 20+ minutes.

Expected behavior

There should not be a performance degradation when listing partitions for operations such as clustering.

Environment Description

Hudi version : 0.11.0
Spark version : 3.1.2
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No

The text was updated successfully, but these errors were encountered:

nsivabalan · 2022-06-10T04:33:45Z

@namuny : yes, looks like it regressed. https://issues.apache.org/jira/browse/HUDI-4221
I am reverting the change. will put up a PR.

nsivabalan · 2022-06-10T04:38:40Z

#5829

nsivabalan · 2022-06-10T04:38:59Z

thanks for pointing it out. since we have a PR, closing it out.

nsivabalan added the priority:critical production down; pipelines stalled; Need help asap. label Jun 7, 2022

nsivabalan self-assigned this Jun 10, 2022

nsivabalan closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Performance degradation for listing partitions #5776

[SUPPORT] Performance degradation for listing partitions #5776

namuny commented Jun 7, 2022 •

edited

Loading

nsivabalan commented Jun 10, 2022

nsivabalan commented Jun 10, 2022

nsivabalan commented Jun 10, 2022

[SUPPORT] Performance degradation for listing partitions #5776

[SUPPORT] Performance degradation for listing partitions #5776

Comments

namuny commented Jun 7, 2022 • edited Loading

nsivabalan commented Jun 10, 2022

nsivabalan commented Jun 10, 2022

nsivabalan commented Jun 10, 2022

namuny commented Jun 7, 2022 •

edited

Loading