-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1151: [C++] Fix ColumnWriter
for non-UTC Timestamp columns
#1088
Conversation
@@ -1837,7 +1837,7 @@ namespace orc { | |||
// TimestampVectorBatch already stores data in UTC | |||
int64_t millsUTC = secs[i] * 1000 + nanos[i] / 1000000; | |||
if (!isUTC) { | |||
millsUTC = timezone.convertToUTC(millsUTC); | |||
millsUTC = timezone.convertToUTC(secs[i]) * 1000 + nanos[i] / 1000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @noirello . Could you make a test case for this fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added a UTC and a non UTC test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, double check the timestamps in the test cases. Time zones always can be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a little counter-intuitive because we only convert secs[i]
only. So, Timezone.convertToUTC
doesn't care nanos
part properly and we should not put it here?
Lines 604 to 606 in f4c7cc1
int64_t convertToUTC(int64_t clk) const override { | |
return clk + getVariant(clk).gmtOffset; | |
} |
If then, can we fix it in Timezone.convertToUTC
instead? Is there a side-effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wgtmac , @stiga-huang too because this is a correctness issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have set nanosecond in the test case WriterTest.writeTimestampWithTimezone, this issue may be fixed earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a little counter-intuitive because we only convert
secs[i]
only. So,Timezone.convertToUTC
doesn't carenanos
part properly and we should not put it here?Lines 604 to 606 in f4c7cc1
int64_t convertToUTC(int64_t clk) const override { return clk + getVariant(clk).gmtOffset; } If then, can we fix it in
Timezone.convertToUTC
instead? Is there a side-effect?
I think the current fix is good enough. As time_t uses second internally, we'd better keep the contract of Timezone.convertToUTC. We may add some comment above the fix to help understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. +1 for @wgtmac 's final decision.
ColumnWriter
for non-UTC Timestamp columns
EXPECT_EQ(stripeColStats->getUpperBound(), expectedMaxMillis + 1); | ||
} | ||
|
||
TEST(TestTimestampStatistics, testTimezoneNonUTC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM from my side. It would be great if we can get @wgtmac or @stiga-huang 's final sign-off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @noirello @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for fixing this, @noirello !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Do we need this at ORC 1.7.5?
Yes, I think this should be included in the 1.7 branch. |
+1 for backporting. |
### What changes were proposed in this pull request? Fix converting non UTC timestamps for statistics. ### Why are the changes needed? Currently, the statistics for timestamp columns are incorrect, when the writer's time zone is not UTC. ### How was this patch tested? Ran the existing test cases. (cherry picked from commit 9042421) Signed-off-by: Dongjoon Hyun <[email protected]>
Hi All. I backported this to branch-1.7. |
…e#1088) ### What changes were proposed in this pull request? Fix converting non UTC timestamps for statistics. ### Why are the changes needed? Currently, the statistics for timestamp columns are incorrect, when the writer's time zone is not UTC. ### How was this patch tested? Ran the existing test cases.
What changes were proposed in this pull request?
Fix converting non UTC timestamps for statistics.
Why are the changes needed?
Currently, the statistics for timestamp columns are incorrect, when the writer's time zone is not UTC.
How was this patch tested?
Ran the existing test cases.