Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6079] Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor. #8460

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Apr 14, 2023

Change Logs

JIRA: [HUDI-6079] Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor.

Improve the log format of hudi-hive-sync.

Case1: Use non-standard logging.

Before:

LOG.info(String.format("Time taken to getTableSchema: %s ms", (end - start)));

After:

LOG.info("Time taken to getTableSchema: {} ms.", (end - start));

Case2: Our logs have been changed to slf4j.
This log format recommends using placeholders to record logs

Before:

LOG.error(databaseName + "." + tableName + " add partition failed", e);

After:

LOG.error("{}.{} add partition failed.", databaseName, tableName);

Case3: Part of the log prints the tableName , and part of the log prints the databaseName.tableName
uniformly modified to databaseName.tableName.

Before:

HMSDDLExecutor#updateTableDefinition
LOG.error("Failed to update table for " + tableName, e);

HMSDDLExecutor#addPartitionsToTable
LOG.error(databaseName + "." + tableName + " add partition failed", e);

After:

HMSDDLExecutor#updateTableDefinition
LOG.error("{}.{} update table failed.", databaseName, tableName, e);

HMSDDLExecutor#addPartitionsToTable
LOG.error("{}.{} add partition failed.", databaseName, tableName);
  1. Remove redundant code
  2. Increase code readability

There are a lot of append when generating SQL, it is very difficult to read.

Case1:
Before:

String partitionsStr = String.join(",", partitionFields);
    StringBuilder sb = new StringBuilder();
    if (config.getBoolean(HIVE_CREATE_MANAGED_TABLE)) {
      sb.append("CREATE TABLE IF NOT EXISTS ");
    } else {
      sb.append("CREATE EXTERNAL TABLE IF NOT EXISTS ");
    }
    sb.append(HIVE_ESCAPE_CHARACTER).append(config.getStringOrDefault(META_SYNC_DATABASE_NAME)).append(HIVE_ESCAPE_CHARACTER)
            .append(".").append(HIVE_ESCAPE_CHARACTER).append(tableName).append(HIVE_ESCAPE_CHARACTER);
    sb.append("( ").append(columns).append(")");
    if (!config.getSplitStrings(META_SYNC_PARTITION_FIELDS).isEmpty()) {
      sb.append(" PARTITIONED BY (").append(partitionsStr).append(")");
    }
    if (config.getString(HIVE_SYNC_BUCKET_SYNC_SPEC) != null) {
      sb.append(' ' + config.getString(HIVE_SYNC_BUCKET_SYNC_SPEC) + ' ');
    }
    sb.append(" ROW FORMAT SERDE '").append(serdeClass).append("'");
    if (serdeProperties != null && !serdeProperties.isEmpty()) {
      sb.append(" WITH SERDEPROPERTIES (").append(propertyToString(serdeProperties)).append(")");
    }
   .....


after:
ST command = new ST(CREATE_TABLE_TEMPLATE);
command.add(DATABASE_NAME, dataBaseName);
command.add(TABLE_NAME, tableName);
command.add(EXTERNAL, getExternal(config));
command.add(LIST_COLUMNS, getColumns(storageSchema, config));
if (!config.getSplitStrings(META_SYNC_PARTITION_FIELDS).isEmpty()) {
  command.add(PARTITIONS, getPartitions(storageSchema, config));
}
command.add(ROW_FORMAT, getRowFormat(serdeClass, serdeProperties,
    inputFormatClass, outputFormatClass));
if (config.getString(HIVE_SYNC_BUCKET_SYNC_SPEC) != null) {
  command.add(BUCKETS, config.getBuckets());
}
command.add(LOCATION_BLOCK, getLocationBlock(config.getAbsoluteBasePath()));
command.add(PROPERTIES, propertyToString(tableProperties));
return command.render();

When using field types, strings are used extensively. We have a type definition system that can be used directly.

Impact

none

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@slfan1989 slfan1989 marked this pull request as draft April 14, 2023 09:49
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 26, 2024
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 Thanks for putting up this improvement! Since it's been a while, do you still plan to revise it and make it ready for review?

@slfan1989
Copy link
Contributor Author

@slfan1989 Thanks for putting up this improvement! Since it's been a while, do you still plan to revise it and make it ready for review?

@yihua Thank you for your message! I will continue to improve this PR and submit the new code as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

3 participants