SHS-NG M4.0: Initial UI hook up. #6

vanzin · 2017-04-17T20:10:35Z

This change adds some building blocks for hooking up the new data store
to the UI. This is achieved by returning a new SparkUI implementation when
using the new KVStoreProvider; this new UI does not currently contain any
data for the old UI / API endpoints; that will be implemented in M4.

The interaction between the UI and the underlying store was isolated
in a new AppStateStore class. The M4 code will call into this class to
retrieve data to populate the UI and API.

Some new indexed fields had to be added to the stored types so that the
code could efficiently process the API requests.

On the history server side, some changes were made in how the UI is used.
Because there's state kept on disk, the code needs to be more careful about
closing those resources when the UIs are unloaded; and because of that some
locking needs to exist to make sure it's OK to move files around. The app
cache was also simplified a bit; it just checks a flag in the UI instance
to check whether it should be used, and tries to re-load it when the FS
listing code invalidates a loaded UI.

This change adds some building blocks for hooking up the new data store to the UI. This is achieved by returning a new SparkUI implementation when using the new KVStoreProvider; this new UI does not currently contain any data for the old UI / API endpoints; that will be implemented in M4. The interaction between the UI and the underlying store was isolated in a new AppStateStore class. Code in later patches will call into this class to retrieve data to populate the UI and API. Some new indexed fields had to be added to the stored types so that the code could efficiently process the API requests. On the history server side, some changes were made in how the UI is used. Because there's state kept on disk, the code needs to be more careful about closing those resources when the UIs are unloaded; and because of that some locking needs to exist to make sure it's OK to move files around. The app cache was also simplified a bit; it just checks a flag in the UI instance to check whether it should be used, and tries to re-load it when the FS listing code invalidates a loaded UI.

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12590 from dongjoon-hyun/SPARK-14830. (cherry picked from commit 6e63201) Signed-off-by: Michael Armbrust <[email protected]>

…enkins's test results ### What changes were proposed in this pull request? See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109834/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/ ![Screen Shot 2019-08-28 at 4 08 58 PM](https://user-images.githubusercontent.com/6477701/63833484-2a23ea00-c9ae-11e9-91a1-0859cb183fea.png) ```xml <?xml version="1.0" encoding="UTF-8"?> <testsuite hostname="C02Y52ZLJGH5" name="org.apache.spark.sql.SQLQueryTestSuite" tests="3" errors="0" failures="0" skipped="0" time="14.475"> ... <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scala UDF" time="6.703"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Regular Python UDF" time="4.442"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scalar Pandas UDF" time="3.33"> </testcase> <system-out/> <system-err/> </testsuite> ``` Root cause seems a bug in SBT - it truncates the test name based on the last dot. sbt/sbt#2949 https://github.com/sbt/sbt/blob/v0.13.18/testing/src/main/scala/sbt/JUnitXmlTestsListener.scala#L71-L79 I tried to find a better way but couldn't find. Therefore, this PR proposes a workaround by appending the test file name into the assert log: ```diff [info] - inner-join.sql *** FAILED *** (4 seconds, 306 milliseconds) + [info] inner-join.sql [info] Expected "1 a [info] 1 a [info] 1 b [info] 1[]", but got "1 a [info] 1 a [info] 1 b [info] 1[ b]" Result did not match for query #6 [info] SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag (SQLQueryTestSuite.scala:377) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) ``` It will at least prevent us to search full logs to identify which test file is failed by clicking filed test. Note that this PR does not fully fix the issue but only fix the logs on its failed tests. ### Why are the changes needed? To debug Jenkins logs easier. Otherwise, we should open full logs and search which test was failed. ### Does this PR introduce any user-facing change? It will print out the file name of failed tests in Jenkins' test reports. ### How was this patch tested? Manually tested but Jenkins tests are required in this PR. Now it at least shows which file it is: ![Screen Shot 2019-08-30 at 10 16 32 PM](https://user-images.githubusercontent.com/6477701/64023705-de22a200-cb73-11e9-8806-2e98ad35adef.png) Closes apache#25630 from HyukjinKwon/SPARK-28894-1. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

vanzin force-pushed the shs-ng/M4.0 branch from dcac650 to e237ecb Compare April 17, 2017 21:41

vanzin force-pushed the shs-ng/M3 branch from 817f794 to 8df04b8 Compare April 17, 2017 21:41

vanzin force-pushed the shs-ng/M4.0 branch from e237ecb to dd39001 Compare April 25, 2017 17:43

vanzin force-pushed the shs-ng/M3 branch from 8df04b8 to f9ff270 Compare April 25, 2017 17:43

vanzin force-pushed the shs-ng/M4.0 branch from dd39001 to 8d339a9 Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M3 branch from f9ff270 to df55451 Compare April 26, 2017 18:11

vanzin force-pushed the shs-ng/M4.0 branch from 8d339a9 to 381f42e Compare April 26, 2017 23:58

vanzin force-pushed the shs-ng/M3 branch from df55451 to 2211471 Compare April 26, 2017 23:58

vanzin force-pushed the shs-ng/M4.0 branch from 381f42e to 24e9079 Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M3 branch from 2211471 to 47acd34 Compare April 27, 2017 18:14

vanzin force-pushed the shs-ng/M4.0 branch from 24e9079 to 70def42 Compare April 27, 2017 21:31

vanzin force-pushed the shs-ng/M3 branch from 47acd34 to 1e61647 Compare April 27, 2017 21:31

vanzin force-pushed the shs-ng/M4.0 branch 2 times, most recently from 2c3d967 to 28ffe8b Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M3 branch from 1e61647 to cf10c0e Compare May 1, 2017 22:58

vanzin force-pushed the shs-ng/M4.0 branch from 28ffe8b to ae8ad1b Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M3 branch from cf10c0e to 6fcba31 Compare May 5, 2017 21:19

vanzin force-pushed the shs-ng/M4.0 branch from ae8ad1b to 67bf76e Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M3 branch from 6fcba31 to ceb833d Compare May 5, 2017 22:57

vanzin force-pushed the shs-ng/M4.0 branch from 67bf76e to d088e00 Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M3 branch from ceb833d to e204193 Compare May 8, 2017 17:25

vanzin force-pushed the shs-ng/M4.0 branch from d088e00 to 65fc0d2 Compare May 9, 2017 01:09

vanzin force-pushed the shs-ng/M3 branch from e204193 to 9e6b754 Compare May 9, 2017 01:09

vanzin force-pushed the shs-ng/M4.0 branch from 65fc0d2 to f8e91cb Compare May 15, 2017 20:45

vanzin force-pushed the shs-ng/M3 branch from 9e6b754 to 0cd985a Compare May 15, 2017 20:45

vanzin force-pushed the shs-ng/M4.0 branch from f8e91cb to b3e02d3 Compare May 26, 2017 18:53

vanzin force-pushed the shs-ng/M3 branch from 0cd985a to 22af29f Compare May 26, 2017 18:53

vanzin force-pushed the shs-ng/M4.0 branch from b3e02d3 to ebf6fef Compare May 30, 2017 23:04

vanzin force-pushed the shs-ng/M3 branch from 22af29f to 19217ff Compare May 30, 2017 23:04

vanzin closed this May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHS-NG M4.0: Initial UI hook up. #6

SHS-NG M4.0: Initial UI hook up. #6

vanzin commented Apr 17, 2017

SHS-NG M4.0: Initial UI hook up. #6

SHS-NG M4.0: Initial UI hook up. #6

Conversation

vanzin commented Apr 17, 2017