Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-26474][hive] Fold exprNode to fix the issue of failing to call some hive udf required constant parameters with implicit constant passed #18975

Merged
merged 4 commits into from
Aug 31, 2022

Conversation

luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented Mar 4, 2022

What is the purpose of the change

The change is to enable ExprNode folding and to fold UDFOPNegative.
It's mainly to fix the issue that it'll throw the exception BROUND second argument only takes constant when call some hive udf requiring constant parameter with implicit constant like -1 or cast(null as int) .

Brief change log

  • enable ExprNode fold
  • when the ExprNodeDesc is negative operator, try to fold it to ExprNodeConstantDesc
  • modify the logic for convertConstant in HiveParserRexNodeConverter for the value for the data type of IntervalYearMonth/IntervalDayTimeType will be class of HiveIntervalYearMonth, HiveIntervalDayTime.

Verifying this change

Added udf.q contains the sql statements that'll fail before this fix.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 4, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@luoyuxia luoyuxia force-pushed the FLINK-26474-1 branch 2 times, most recently from 00eda42 to ab66505 Compare March 4, 2022 14:04
@luoyuxia luoyuxia changed the title [FLINK-26474][hive] fix the issue of failing to call some hive udf re… [FLINK-26474][hive] Fold exprNode to fix the issue of failing to call some hive udf required constant parameters with implicit constant passed Jun 22, 2022
@luoyuxia luoyuxia force-pushed the FLINK-26474-1 branch 2 times, most recently from d2abd7a to 809ed66 Compare June 26, 2022 06:34
@luoyuxia
Copy link
Contributor Author

luoyuxia commented Jul 1, 2022

@flinkbot run azure

@@ -0,0 +1,13 @@
-- SORT_QUERY_RESULTS

select bround(55.0, -1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test which the second parameter is positive?

Copy link
Contributor Author

@luoyuxia luoyuxia Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a query select bround(55.0, +1) to cover the positive number with + sign.
For the positive number without + sign, I think the query select percentile_approx(x, 0.5) from foo in udaf.q has covered it.

@luoyuxia luoyuxia force-pushed the FLINK-26474-1 branch 2 times, most recently from 3241d87 to c081421 Compare August 22, 2022 02:03
@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

2 similar comments
@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

@wuchong
Copy link
Member

wuchong commented Aug 26, 2022

HiveDialectQueryITCase.testCastTimeStampToDecimal:804 is failed, please take a look.

@wuchong
Copy link
Member

wuchong commented Aug 26, 2022

The HiveDialectQueryITCase.testCastTimeStampToDecimal is still failed.

@wuchong
Copy link
Member

wuchong commented Aug 28, 2022

@flinkbot run azure

@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

2 similar comments
@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

@luoyuxia
Copy link
Contributor Author

@flinkbot run azure

@@ -788,7 +788,7 @@ public void testCastTimeStampToDecimal() throws Exception {
timestamp))
.collect());
assertThat(results.toString())
.isEqualTo(String.format("[+I[%s]]", expectTimeStampDecimal.toFormatString(8)));
.isEqualTo(String.format("[+I[%s]]", expectTimeStampDecimal));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for this change? IIUC, the toFormatString(8) is on purpose because it is cast to decimal(30,8).

Copy link
Contributor Author

@luoyuxia luoyuxia Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's. But there's a special case when comes to cast constant and constant fold is enabled. Actually, the current behavior is same to Hive.
I try with the following sql in Hive:

hive> select cast(cast('2012-12-19 11:12:19.1234567' as timestamp) as decimal(30,8));
1355915539.1234567

hive> insert into t2 values('2012-12-19 11:12:19.1234567')

hive> select  cast(c2 as decimal(30, 8)) from t2;
1355915539.12345670

hive > insert into t1 select * from t2;

hive > select * from t1;
1355915539.12345670

The plan for the sql in hive select cast(cast('2012-12-19 11:12:19.1234567' as timestamp) as decimal(30,8)) is:

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: _dummy_table
          Row Limit Per Split: 1
          Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE
          Select Operator
            expressions: 1355915539.1234567 (type: decimal(30,8))
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: COMPLETE
            ListSink

The plan for select cast(c1 as decimal(30, 8)) from t1 is :

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t1
          Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: CAST( c1 AS decimal(30,8)) (type: decimal(30,8))
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
            ListSink

The reason I found is the HiveDecimalConverter used to convert data in Hive's GenericUDFToDecimal function actually won't padding zero for 2012-12-19 11:12:19.1234567, althogh the type is decimal(30,8).
Then, the first sql will select a constant 1355915539.1234567.
But for the second sql, a further padding will be done which will result 1355915539.12345670.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation.

@wuchong wuchong merged commit fc5730a into apache:master Aug 31, 2022
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
… some hive udf required constant parameters with implicit constant passed

This closes apache#18975
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants