Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1055: [C++] Add the timezone option for the csv-import tool #975

Merged
merged 8 commits into from
Jan 4, 2022

Conversation

coderex2522
Copy link
Contributor

@coderex2522 coderex2522 commented Dec 23, 2021

What changes were proposed in this pull request?

The pull request provides the csv-import tool with support for timezone settings

Why are the changes needed?

This is a new option to mitigate ORC-1055 situation.

How was this patch tested?

The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc

@github-actions github-actions bot added the CPP label Dec 23, 2021
@coderex2522 coderex2522 changed the title ORC-1055: [C++] Add timezone options for csv-import tool ORC-1055: [C++] Set timezone writer options for csv-import tool Dec 24, 2021
@coderex2522
Copy link
Contributor Author

@wgtmac Hi, please take a look!

@coderex2522
Copy link
Contributor Author

These timezones alias information is a reference to https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/util/calendar/ZoneInfoFile.java line:220-244

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @coderex2522 .

  • In general, we need a test case which failed before your PR. Do you think you can add a test coverage for that?
  • This PR proposes to introduce alias which is borrowed from Java oldMappings. Does this PR handle both two JVM cases sun.timezone.ids.oldmapping=true/false?

@wgtmac
Copy link
Member

wgtmac commented Dec 27, 2021

  • ed before your PR. Do

+1 for adding a test case. We can directly use the csv file from the JIRA.

After some investigation, I think the Java odlMappings may introduce new issues. For example, CST can either be Central Standard Time (America/Chicago) or China Standard Time (Asia/Shanghai).

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 28, 2021

Thank you for updating, @coderex2522 . Could you revise the PR title and description according to the new code?

@coderex2522 coderex2522 changed the title ORC-1055: [C++] Set timezone writer options for csv-import tool ORC-1055: [C++] Add the timezone option for the csv-import tool Dec 28, 2021
@coderex2522
Copy link
Contributor Author

The PR title and descrition has been modified, @dongjoon-hyun please take a look again!

<< " <schema> <input> <output>\n"
<< "Import CSV file into an Orc file using the specified schema.\n"
<< "The timezone can be viewed in the directory /usr/share/zoneinfo\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It should be stated clearly that the timezone is writer timezone of timestamp types.
  • You have specified it as a required_argument which conflicts with the comment here.

std::string output;
std::string error;

std::string option = "--timezone=America/Los_Angeles";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, this test case always succeeds at America/Los_Angeles timezone? If then, can we choose less popular timezone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't set timezone for the WriterOptions, the writer will write the timestamp with the GMT timezone. And the orc-contents tool will scan the content in the GMT timezone.This case should not be successful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add another test to use timezone as Europe/Paris or Asia/Shanghai.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderex2522 . Yes, the test case should fail without your patch and should pass with your patch. That's the purpose of verification of your code's contribution.

And, +1 for @wgtmac 's suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add another test to use timezone as Europe/Paris or Asia/Shanghai.

Add the timezone 'Europe/Paris' test case done.

@coderex2522
Copy link
Contributor Author

@williamhyun Hi, can you help investigate this problem in continuous-integration/appveyor/pr?
The error message shows "The build phase is set to "MSBuild" mode (default), but no Visual Studio project or solution files were found in the root directory. If you are not building Visual Studio project switch build mode to "Script" and provide your custom build command."

@guiyanakuang
Copy link
Member

guiyanakuang commented Jan 4, 2022

@coderex2522 Don't worry, it has nothing to do with your pr
Appveyor's webhook is being removed.
https://issues.apache.org/jira/browse/INFRA-22692

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@dongjoon-hyun dongjoon-hyun added this to the 1.7.3 milestone Jan 4, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @coderex2522 , @wgtmac , @guiyanakuang .

@dongjoon-hyun
Copy link
Member

Merged to main/branch-1.7 for Apache ORC 1.7.3.

cc @williamhyun

@dongjoon-hyun dongjoon-hyun merged commit 9a66348 into apache:main Jan 4, 2022
dongjoon-hyun pushed a commit that referenced this pull request Jan 4, 2022
### What changes were proposed in this pull request?

The pull request provides the csv-import tool with support for timezone settings

### Why are the changes needed?

This is a new option to mitigate ORC-1055 situation.

### How was this patch tested?

The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc

(cherry picked from commit 9a66348)
Signed-off-by: Dongjoon Hyun <[email protected]>
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
…he#975)

### What changes were proposed in this pull request?

The pull request provides the csv-import tool with support for timezone settings

### Why are the changes needed?

This is a new option to mitigate ORC-1055 situation.

### How was this patch tested?

The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants