Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-26474][hive] Fold exprNode to fix the issue of failing to call some hive udf required constant parameters with implicit constant passed #18975

Merged
merged 4 commits into from
Aug 31, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ public void testCastTimeStampToDecimal() throws Exception {
timestamp))
.collect());
assertThat(results.toString())
.isEqualTo(String.format("[+I[%s]]", expectTimeStampDecimal.toFormatString(8)));
.isEqualTo(String.format("[+I[%s]]", expectTimeStampDecimal));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for this change? IIUC, the toFormatString(8) is on purpose because it is cast to decimal(30,8).

Copy link
Contributor Author

@luoyuxia luoyuxia Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's. But there's a special case when comes to cast constant and constant fold is enabled. Actually, the current behavior is same to Hive.
I try with the following sql in Hive:

hive> select cast(cast('2012-12-19 11:12:19.1234567' as timestamp) as decimal(30,8));
1355915539.1234567

hive> insert into t2 values('2012-12-19 11:12:19.1234567')

hive> select  cast(c2 as decimal(30, 8)) from t2;
1355915539.12345670

hive > insert into t1 select * from t2;

hive > select * from t1;
1355915539.12345670

The plan for the sql in hive select cast(cast('2012-12-19 11:12:19.1234567' as timestamp) as decimal(30,8)) is:

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: _dummy_table
          Row Limit Per Split: 1
          Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column stats: COMPLETE
          Select Operator
            expressions: 1355915539.1234567 (type: decimal(30,8))
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: COMPLETE
            ListSink

The plan for select cast(c1 as decimal(30, 8)) from t1 is :

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t1
          Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: CAST( c1 AS decimal(30,8)) (type: decimal(30,8))
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 112 Basic stats: COMPLETE Column stats: NONE
            ListSink

The reason I found is the HiveDecimalConverter used to convert data in Hive's GenericUDFToDecimal function actually won't padding zero for 2012-12-19 11:12:19.1234567, althogh the type is decimal(30,8).
Then, the first sql will select a constant 1355915539.1234567.
But for the second sql, a further padding will be done which will result 1355915539.12345670.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation.


// test insert timestamp type to decimal type directly
tableEnv.executeSql("create table t1 (c1 DECIMAL(38,6))");
Expand Down