You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm noticing a steep increase in duration for listing partitions during clustering (and potentially for other operations such as clean, haven't tested this yet), specifically after this PR was merged. I'm yet to get to the bottom of exactly why, but reverting the implementation of FileSystemBackedTableMetadata.getAllPartitionPaths to 0.9.0's implementation gives me a performance boost.
Test results:
0.9.0 approach (but using 0.11.0 for everything else) - 50 seconds to list partitions
Pure 0.11.0 approach - over 20 minutes to list partitions
My setup:
Hudi 0.11.0
CoW + inline clustering
Metadata table is disabled
Test results above is with 10,000 partitions, using S3.
Regardless of why the metadata is disabled, I'm curious to understand why the partition listing time for 10,000 partitions goes from sub minute to 20+ minutes.
Expected behavior
There should not be a performance degradation when listing partitions for operations such as clustering.
Environment Description
Hudi version : 0.11.0
Spark version : 3.1.2
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
The text was updated successfully, but these errors were encountered:
I'm noticing a steep increase in duration for listing partitions during clustering (and potentially for other operations such as
clean
, haven't tested this yet), specifically after this PR was merged. I'm yet to get to the bottom of exactly why, but reverting the implementation of FileSystemBackedTableMetadata.getAllPartitionPaths to 0.9.0's implementation gives me a performance boost.Test results:
My setup:
Regardless of why the metadata is disabled, I'm curious to understand why the partition listing time for 10,000 partitions goes from sub minute to 20+ minutes.
Expected behavior
There should not be a performance degradation when listing partitions for operations such as clustering.
Environment Description
Hudi version : 0.11.0
Spark version : 3.1.2
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
The text was updated successfully, but these errors were encountered: