-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1055: [C++] Add the timezone option for the csv-import tool #975
Conversation
@wgtmac Hi, please take a look! |
These timezones alias information is a reference to https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/util/calendar/ZoneInfoFile.java line:220-244 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @coderex2522 .
- In general, we need a test case which failed before your PR. Do you think you can add a test coverage for that?
- This PR proposes to introduce
alias
which is borrowed from JavaoldMappings
. Does this PR handle both two JVM casessun.timezone.ids.oldmapping=true/false
?
+1 for adding a test case. We can directly use the csv file from the JIRA. After some investigation, I think the Java odlMappings may introduce new issues. For example, CST can either be Central Standard Time (America/Chicago) or China Standard Time (Asia/Shanghai). |
Thank you for updating, @coderex2522 . Could you revise the PR title and description according to the new code? |
The PR title and descrition has been modified, @dongjoon-hyun please take a look again! |
tools/src/CSVFileImport.cc
Outdated
<< " <schema> <input> <output>\n" | ||
<< "Import CSV file into an Orc file using the specified schema.\n" | ||
<< "The timezone can be viewed in the directory /usr/share/zoneinfo\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It should be stated clearly that the timezone is writer timezone of timestamp types.
- You have specified it as a required_argument which conflicts with the comment here.
tools/test/TestCSVFileImport.cc
Outdated
std::string output; | ||
std::string error; | ||
|
||
std::string option = "--timezone=America/Los_Angeles"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, this test case always succeeds at America/Los_Angeles
timezone? If then, can we choose less popular timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't set timezone for the WriterOptions, the writer will write the timestamp with the GMT timezone. And the orc-contents tool will scan the content in the GMT timezone.This case should not be successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add another test to use timezone as Europe/Paris or Asia/Shanghai.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderex2522 . Yes, the test case should fail without your patch and should pass with your patch. That's the purpose of verification of your code's contribution.
And, +1 for @wgtmac 's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add another test to use timezone as Europe/Paris or Asia/Shanghai.
Add the timezone 'Europe/Paris' test case done.
@williamhyun Hi, can you help investigate this problem in continuous-integration/appveyor/pr? |
@coderex2522 Don't worry, it has nothing to do with your pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @coderex2522 , @wgtmac , @guiyanakuang .
Merged to main/branch-1.7 for Apache ORC 1.7.3. cc @williamhyun |
### What changes were proposed in this pull request? The pull request provides the csv-import tool with support for timezone settings ### Why are the changes needed? This is a new option to mitigate ORC-1055 situation. ### How was this patch tested? The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc (cherry picked from commit 9a66348) Signed-off-by: Dongjoon Hyun <[email protected]>
…he#975) ### What changes were proposed in this pull request? The pull request provides the csv-import tool with support for timezone settings ### Why are the changes needed? This is a new option to mitigate ORC-1055 situation. ### How was this patch tested? The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc
What changes were proposed in this pull request?
The pull request provides the csv-import tool with support for timezone settings
Why are the changes needed?
This is a new option to mitigate ORC-1055 situation.
How was this patch tested?
The unit case is TestCSVFileImport.testTimezoneOption in TestCSVFileImport.cc