Fix attribute rewiring #23

marmbrus · 2016-01-07T03:49:33Z

No description provided.

AmplabJenkins · 2016-01-07T04:03:16Z

Merged build finished. Test FAILed.

AmplabJenkins · 2016-01-07T04:03:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/spark-streaming-df-test/16/
Test FAILed.

AmplabJenkins · 2016-01-07T06:52:14Z

Merged build finished. Test FAILed.

AmplabJenkins · 2016-01-07T06:52:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/spark-streaming-df-test/18/
Test FAILed.

AmplabJenkins · 2016-01-07T07:16:02Z

Merged build finished. Test PASSed.

AmplabJenkins · 2016-01-07T07:16:02Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/spark-streaming-df-test/19/
Test PASSed.

AmplabJenkins · 2016-01-07T07:39:16Z

Merged build finished. Test PASSed.

AmplabJenkins · 2016-01-07T07:39:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/spark-streaming-df-test/20/
Test PASSed.

Fix attribute rewiring

…w queries ## What changes were proposed in this pull request? This PR aims to implement decimal aggregation optimization for window queries by improving existing `DecimalAggregates`. Historically, `DecimalAggregates` optimizer is designed to transform general `sum/avg(decimal)`, but it breaks recently added windows queries like the followings. The following queries work well without the current `DecimalAggregates` optimizer. **Sum** ```scala scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").head java.lang.RuntimeException: Unsupported window function: MakeDecimal((sum(UnscaledValue(a#31)),mode=Complete,isDistinct=false),12,1) scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").explain() == Physical Plan == WholeStageCodegen : +- Project [sum(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#23] : +- INPUT +- Window [MakeDecimal((sum(UnscaledValue(a#21)),mode=Complete,isDistinct=false),12,1) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS sum(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#23] +- Exchange SinglePartition, None +- Generate explode([1.0,2.0]), false, false, [a#21] +- Scan OneRowRelation[] ``` **Average** ```scala scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").head java.lang.RuntimeException: Unsupported window function: cast(((avg(UnscaledValue(a#40)),mode=Complete,isDistinct=false) / 10.0) as decimal(6,5)) scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").explain() == Physical Plan == WholeStageCodegen : +- Project [avg(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#44] : +- INPUT +- Window [cast(((avg(UnscaledValue(a#42)),mode=Complete,isDistinct=false) / 10.0) as decimal(6,5)) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS avg(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#44] +- Exchange SinglePartition, None +- Generate explode([1.0,2.0]), false, false, [a#42] +- Scan OneRowRelation[] ``` After this PR, those queries work fine and new optimized physical plans look like the followings. **Sum** ```scala scala> sql("select sum(a) over () from (select explode(array(1.0,2.0)) a) t").explain() == Physical Plan == WholeStageCodegen : +- Project [sum(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#35] : +- INPUT +- Window [MakeDecimal((sum(UnscaledValue(a#33)),mode=Complete,isDistinct=false) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),12,1) AS sum(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#35] +- Exchange SinglePartition, None +- Generate explode([1.0,2.0]), false, false, [a#33] +- Scan OneRowRelation[] ``` **Average** ```scala scala> sql("select avg(a) over () from (select explode(array(1.0,2.0)) a) t").explain() == Physical Plan == WholeStageCodegen : +- Project [avg(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#47] : +- INPUT +- Window [cast(((avg(UnscaledValue(a#45)),mode=Complete,isDistinct=false) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) / 10.0) as decimal(6,5)) AS avg(a) OVER ( ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)apache#47] +- Exchange SinglePartition, None +- Generate explode([1.0,2.0]), false, false, [a#45] +- Scan OneRowRelation[] ``` In this PR, *SUM over window* pattern matching is based on the code of hvanhovell ; he should be credited for the work he did. ## How was this patch tested? Pass the Jenkins tests (with newly added testcases) Author: Dongjoon Hyun <[email protected]> Closes apache#12421 from dongjoon-hyun/SPARK-14664.

Fix attribute rewiring

4b6a56a

marmbrus force-pushed the streaming-attributes branch from ed0ea40 to a31d4a5 Compare January 7, 2016 06:54

fix test

764aac9

marmbrus force-pushed the streaming-attributes branch from a31d4a5 to 764aac9 Compare January 7, 2016 07:21

marmbrus added a commit that referenced this pull request Jan 7, 2016

Merge pull request #23 from marmbrus/streaming-attributes

addb3ab

Fix attribute rewiring

marmbrus merged commit addb3ab into streaming-df Jan 7, 2016

marmbrus deleted the streaming-attributes branch March 8, 2016 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix attribute rewiring #23

Fix attribute rewiring #23

marmbrus commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

Fix attribute rewiring #23

Fix attribute rewiring #23

Conversation

marmbrus commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016

AmplabJenkins commented Jan 7, 2016