-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2883] Refactor hive sync tool / config to use reflection and standardize configs #4175
Conversation
f74bd30
to
f6a00f3
Compare
8e0bc42
to
9c13f28
Compare
@rmahindra123 : I can help review this patch. Can you rebase w/ latest master. also, lets sync up sometime to get some context on the patch. |
da0912c
to
9202dc0
Compare
if (this.config.isHiveLocal()) { | ||
writer.getDeltaStreamerWrapper().getDeltaSyncService().getDeltaSync() | ||
.syncHive(getLocalHiveServer().getHiveConf()); | ||
hiveSyncTool = new HiveSyncTool(writer.getWriteConfig().getProps(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we removed or deprecated DeltaSync.syncHive(...) in this patch? if not, would prefer to go via deltaSync syncHive api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a helper function right? The flow seems a bit weird to call the method from DeltaSync. I had removed the method in DeltaSync as well since this was the only place it was used. Can discuss more 1:1 if required.
...a-connect/src/main/java/org/apache/hudi/connect/writers/KafkaConnectTransactionServices.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
Outdated
Show resolved
Hide resolved
.../src/main/scala/org/apache/spark/sql/hudi/command/AlterHoodieTableDropPartitionCommand.scala
Show resolved
Hide resolved
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
Show resolved
Hide resolved
hiveSyncProps.setProperty(HiveSyncConfig.META_SYNC_ASSUME_DATE_PARTITION.key(), "true"); | ||
hiveSyncProps.setProperty(HiveSyncConfig.HIVE_USE_PRE_APACHE_INPUT_FORMAT.key(), "false"); | ||
hiveSyncProps.setProperty(HiveSyncConfig.META_SYNC_PARTITION_FIELDS.key(), "datestr"); | ||
hiveSyncProps.setProperty(HiveSyncConfig.HIVE_BATCH_SYNC_PARTITION_NUM.key(), "3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see you are setting jdbcUrl and you are additionally setting this BATCH_SYNC_PARTITION_NUM in this patch. is that intentional ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks for the thorough review.
- I haven't removed it, its set in line 123.
hiveSyncProps.setProperty(HiveSyncConfig.HIVE_URL.key(), hiveTestService.getJdbcHive2Url());
2 For batch sync partition num, literally every test was setting it manually to HiveTestUtil.hiveSyncConfig.batchSyncNum = 3;
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java
Show resolved
Hide resolved
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/SyncUtilHelpers.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHiveIncrementalPuller.java
Outdated
Show resolved
Hide resolved
@rmahindra123 : just to be cautious. can you try test this patch out for an existing hudi table for the hive sync functionality.
|
@xiarixiaoyao : can you help us test this patch out. Rajesh refactored hive sync quite a bit mostly around class structuring, config usages, instantiation etc. But if you can test this out and let us know if it looks good, would be nice. |
9202dc0
to
aa7efec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments. for some of the resolved comments, I will sync up directly with you
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java
Show resolved
Hide resolved
addressed reviewer comments |
@xiarixiaoyao : Did you get a chance to test this patch out. |
@nsivabalan @rmahindra123 will review and test this pr This weekend |
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/HiveTestUtil.java
Outdated
Show resolved
Hide resolved
hiveSyncProps.setProperty(HiveSyncConfig.META_SYNC_PARTITION_FIELDS.key(), "datestr"); | ||
hiveSyncProps.setProperty(HiveSyncConfig.HIVE_BATCH_SYNC_PARTITION_NUM.key(), "3"); | ||
|
||
hiveSyncConfig = new HiveSyncConfig(hiveSyncProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer to have builder pattern to reliably construct HiveSyncConfig
instead of passing uncontrollable "raw" props.
|
||
public AbstractSyncTool(Properties props, FileSystem fileSystem) { | ||
public AbstractSyncTool(TypedProperties props, Configuration conf, FileSystem fs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a cleaner interface would be
public AbstractSyncTool(TypedProperties props, Configuration conf, FileSystem fs) { | |
public AbstractSyncTool(HoodieSyncConfig syncConfig, FileSystem fs) { |
hadoopConf
should be set by fs.getConf()
. And props
is not controllable; hard for user to decide what to pass in here.
HoodieSyncConfig
is the matching config class for SyncTool
as the names suggest. Internally we can still do props = syncConfig.getProps();
...ync/hudi-sync-common/src/test/java/org/apache/hudi/sync/common/util/TestSyncUtilHelpers.java
Outdated
Show resolved
Hide resolved
...ync/hudi-sync-common/src/test/java/org/apache/hudi/sync/common/util/TestSyncUtilHelpers.java
Outdated
Show resolved
Hide resolved
TypedProperties metaProps = new TypedProperties(); | ||
metaProps.putAll(props); | ||
metaProps.put(HoodieSyncConfig.META_SYNC_BASE_PATH, cfg.targetBasePath); | ||
metaProps.put(HoodieSyncConfig.META_SYNC_BASE_FILE_FORMAT, cfg.baseFileFormat); | ||
if (props.getBoolean(HiveSyncConfig.HIVE_SYNC_BUCKET_SYNC.key(), HiveSyncConfig.HIVE_SYNC_BUCKET_SYNC.defaultValue())) { | ||
metaProps.put(HiveSyncConfig.HIVE_SYNC_BUCKET_SYNC_SPEC, HiveSyncConfig.getBucketSpec(props.getString(HoodieIndexConfig.BUCKET_INDEX_HASH_FIELD.key()), | ||
props.getInteger(HoodieIndexConfig.BUCKET_INDEX_NUM_BUCKETS.key()))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with builder pattern we should be able to simplify this sort of props construction
95caf2b
to
60d9472
Compare
51aabb2
to
3520d00
Compare
this.bucketSpec = props.getString(HIVE_SYNC_BUCKET_SYNC_SPEC.key(), null); | ||
this.bucketSpec = getStringOrDefault(HIVE_SYNC_BUCKET_SYNC_SPEC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmahindra123 Not sure why this was particularly set to null
. seems like the default ""
is ok.
this.syncComment = getBooleanOrDefault(HIVE_SYNC_COMMENT); | ||
} | ||
|
||
// enhance the similar function in child class | ||
public static HiveSyncConfig copy(HiveSyncConfig cfg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used.
...sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/replication/GlobalHiveSyncConfig.java
Show resolved
Hide resolved
this(new TypedProperties(props), new Configuration(), fileSystem); | ||
this(new TypedProperties(props), fileSystem.getConf(), fileSystem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rmahindra123 if user provides an fs, shall we always make use of its conf? Actually not sure why the new API allow passing a different conf rather than use the one from fileSystem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
d20111e
to
de57bff
Compare
de57bff
to
7481149
Compare
f102d78
to
309116a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
this(new TypedProperties(props), new Configuration(), fileSystem); | ||
this(new TypedProperties(props), fileSystem.getConf(), fileSystem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…n and standardize configs (apache#4175)" This reverts commit 5f570ea.
…n and standardize configs (apache#4175)" This reverts commit 5f570ea.
…n and standardize configs (apache#4175)" This reverts commit 5f570ea.
…andardize configs (apache#4175) - Refactor hive sync tool / config to use reflection and standardize configs Co-authored-by: sivabalan <[email protected]> Co-authored-by: Rajesh Mahindra <[email protected]> Co-authored-by: Raymond Xu <[email protected]>
…andardize configs (apache#4175) - Refactor hive sync tool / config to use reflection and standardize configs Co-authored-by: sivabalan <[email protected]> Co-authored-by: Rajesh Mahindra <[email protected]> Co-authored-by: Raymond Xu <[email protected]>
Refactor hive sync tool / config to use reflection and standardize configs