-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap timestamps in ROWTIME expressions with STRINGTOTIMESTAMP #3160
Conversation
AND ROWTIME <= '2017-11-17 04:53:48'; | ||
|
||
If the datestring is inexact, the rest of the timestamp is assumed to be 0. | ||
For example, `ROWTIME = `2019-07-30 11:00` is equivalent to `ROWTIME = `2019-07-30 11:00:00`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, `ROWTIME = `2019-07-30 11:00` is equivalent to `ROWTIME = `2019-07-30 11:00:00`. | |
For example, ``ROWTIME = 2019-07-30 11:00`` is equivalent to ``ROWTIME = 2019-07-30 11:00:00``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i might explicitly call out here that all unspecified elements of the full pattern are right-padded with zeros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere in here i would expect to see a mention of timezone handling too, as this always trips people up.... i.e. in what TZ are your iso-8601 strings interpreted ? that of teh running jvm or utc or ....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggestions.
can we feature-flag this somewhere, so that we can easily turn it off once we have proper datetime data-type support ? i assume that we will eventually change the datatype of the ROWTIME pseudo-column to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature @jzaralim - suggestion below about improving it to support timezones and also I think one of the use cases is a bit risky.
Can you also add a few JSON based QueryTranslationTests to cover this implicit conversion, please?
Please add me again as a reviewer if you want a second pass from me.
|
||
@Test | ||
public void shouldHandleNestedExpressions() { | ||
final String simpleQuery = "SELECT * FROM orders where FOO(ROWTIME + 6000) > '2017-01-01 00:00:00.000' + 35;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a bit dangerous? We're assuming FOO is returning a long
and that represents a datetime, but this might not be the case, e.g. SELECT * FROM orders where IS_LEAP_YEAR(ROWTIME + 6000) > '2017-01-01 00:00:00.000' + 35;
wouldn't make sense!
I don't think we can make any such assumption on the return value from a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts were that this is purely a query rewriter, so those kinds of cases would get caught later on. So for example, IS_LEAP_YEAR(ROWTIME + 6000) > 35
passes this step, but would still eventually get caught during evaluation. I did change the overall logic of the rewriter as per @hjafarpour's comment though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They may get caught later on, but the UX for the user is not great. There's going to get some message that Foo can not be compared to Long
or something, which is very confusing for the user.
ksql-parser/src/test/java/io/confluent/ksql/parser/rewrite/StatementRewriteForRowtimeTest.java
Outdated
Show resolved
Hide resolved
This is all hidden from the user anyway. So once we have a datetime type if/when we are able to flip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jzaralim . Left a comment.
} | ||
|
||
public static boolean requiresRewrite(final Expression expression) { | ||
return expression.toString().contains("ROWTIME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will match any expression with ROWTIME
in it. For instance, it will match the following expression where you should not indeed rewrite the expression:
TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss') = '2019-08-08 12:12:12'
.
You need to change this such that it only rewrites the cases you have mentioned in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular statement remained the same, but the rewriter itself has been updated to only rewrite timestrings when one side of the comparison is exactly ROWTIME
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
this is kind of my point - what's really happening here is that we are introducing some limited form of "auto-casting" from one datatype (long-of-a-timestamp) to another (string). |
Hey all, I realized that I've hit "merge" prematurely. If anyone has any more feedback, I can submit another PR. Sorry about this! |
@blueedgenick ah, that's what you were getting at... then I whole heartily agree! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jzaralim,
Couple of concerns about this remain:
1. Rewriting function calls.
I still feel handling functions is too risky. There are too many edge cases and bad UX.
SELECT * FROM orders where FuncThatReturnsLong(ROWTIME) > '2017-01-01 00:00:00.000';
-- Or
SELECT * FROM orders where ROWTIME > FuncThatReturnsLong('2017-01-01 00:00:00.000');
The PR will rewrite it to:
SELECT * FROM orders where FuncThatReturnsLong(ROWTIME) > STRINGTOTIMESTAMP('2018-01-01T00:00:00.000', 'yyyy-MM-dd''T''HH:mm:ss.SSS');
-- OR
SELECT * FROM orders where ROWTIME > FuncThatReturnsLong(STRINGTOTIMESTAMP('2018-01-01T00:00:00.000', 'yyyy-MM-dd''T''HH:mm:ss.SSS'));
Which is valid SQL, but may not make any sense. ROWTIME is a BIGINT
, but should actually be a datetime, which makes the auto-conversion from date string to long OK. You can't safely assume a BIGINT
return value from any function to also be a pseudo timestamp.
As noted inline, the PR is, I think, not rewriting function calls due to a bug in the code, but if it where, it would incorrectly rewrite this:
SELECT * FROM orders where ROWTIME > FuncThatReturnsDateTime('not a date');
to
-- You can't assume a string parameter to a function is a date string!
SELECT * FROM orders where ROWTIME > FuncThatReturnsDateTime(STRINGTOTIMESTAMP('not a date01T00:00:00.000', 'yyyy-MM-dd''T''HH:mm:ss.SSS'));
2. Error handling needs improving
The following should generate an error message but doesn't, it just silently filters all rows out:
-- should generate error:
SELECT source FROM TEST WHERE ROWTIME < 'not a date';
3. (MINOR) Does handle casts
e.g.
SELECT * FROM orders where cast(ROWTIME as BIGING) > '2017-01-01 00:00:00.000';
The above doesn't work, which is a minor thing, but is an indication there are probably a lot of other similar edge cases to this 'magic'.
What worries me is that there may be many more edge cases that we're not thought of. I would urge you to drop the (broken?) function call handling. I'd even go as far as saying only rewrite if one side is a pure ROWTIME
and the other is string. This minimises the surface area for edge cases.
I'd then really think how this can be misused and add tests. What are the edge cases? e.g. string that is not a date, and then ensure the UX is good.
Good luck :D
|
||
@Override | ||
public Expression visitFunctionCall(final FunctionCall node, final Object context) { | ||
return (Expression) new StatementRewriter().process(node, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line do anything other than return a duplicate of the function call node? I think StatementRewriter
should be abstract
as its meant as a base class for rewrites. In itself it doesn't actually change the query, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It stops the rewriter from further processing anything within a function. So, if the expression looks like ROWTIME > FOO('2017-01-01 ' + 100) + '2019-01-01'
, then '2019-01-01'
will get wrapped but '2017-01-01'
doesn't. This actually also addresses the first point above, since function calls don't get rewritten at all.
@big-andy-coates The decision flow for a rewrite are as follows:
So, the following statements do not get rewritten: As for error handling, ideally
EDIT: After thinking about it more, I agree that downscaling the rewrite conditions to only rewriting if one side is |
Description
Because the
ROWTIME
is stored as a Long, the user either has to pass in Long values or useSTRINGTOTIMESTAMP
function to make queries usingROWTIME
, for example:ROWTIME > STRINGTOTIMESTAMP('1970-01-01', 'yyyy-mm-dd')
ROWTIME BETWEEN 156469222000 AND 1564692253275
This change wraps datestrings with
STRINGTOTIMESTAMP
so the user can directly pass datestrings for querying:ROWTIME > '1970-01-01'
ROWTIME BETWEEN '1970-01-01 08:00:00' AND ''1970-01-01 10:00:00
Testing done
Wrote tests for several types of expressions.
Reviewer checklist