SPARK-1336 Reducing the output of run-tests script. #262

ScrapCodes · 2014-03-28T12:43:01Z

No description provided.

AmplabJenkins · 2014-03-28T12:47:22Z

Merged build triggered. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T12:47:31Z

Merged build started. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T13:43:26Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-03-28T13:43:26Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13547/

pwendell · 2014-03-29T06:11:58Z

dev/run-tests


 echo "========================================================================="
 echo "Running Spark unit tests"
 echo "========================================================================="
-sbt/sbt assembly test
+sbt/sbt assembly test | grep -v "info.*Resolving"


Could you at this part to MIMA also?

Also maybe we could get rid of some other annoying stuff here too:

+sbt/sbt assembly test | grep -v -e "info.*Resolving" -e "warn.*Merging" -e "info.*Including"

pwendell · 2014-03-29T07:24:24Z

Looking good. Also for Python, could we only print the unit test output if there are failures? I think we already tee it too a file so it should be pretty easy.

ScrapCodes · 2014-03-29T15:15:54Z

There was one more alternative i.e. we could have added log4j.properties with loglevel at ERROR. But this should also be fine.

AmplabJenkins · 2014-03-29T15:17:22Z

Merged build triggered. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-29T15:17:32Z

Merged build started. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-29T16:28:35Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-03-29T16:28:36Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13574/

pwendell · 2014-03-30T06:03:19Z

Looks great! I've merged this.

marmbrus · 2014-04-03T23:37:59Z

Hey, so I think this actually breaks our CI. Adding | grep swallows the non-zero return code from sbt failures. See here for a Jenkins run that fails to compile, but still gets a green light.

Author: Prashant Sharma <[email protected]> Author: Prashant Sharma <[email protected]> Closes apache#262 from ScrapCodes/SPARK-1336/ReduceVerbosity and squashes the following commits: 87dfa54 [Prashant Sharma] Further reduction in noise and made pyspark tests to fail fast. 811170f [Prashant Sharma] Reducing the ouput of run-tests script.

Add job bosh-openstack-cpi-release-acceptance-test

…ache#262)

…runing ### What changes were proposed in this pull request? Remove `OptimizeSubqueries` from batch of `PartitionPruning` to make DPP support more cases. For example: ```sql SELECT date_id, product_id FROM fact_sk f JOIN (select store_id + 3 as new_store_id from dim_store where country = 'US') s ON f.store_id = s.new_store_id ``` Before this PR: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(true)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#274] +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) +- *(1) ColumnarToRow +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> ``` After this PR: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(store_id#4001 IN dynamicpruning#4007)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> : +- SubqueryBroadcast dynamicpruning#4007, 0, [new_store_id#3997], [id=#263] : +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#262] : +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] : +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) : +- *(1) ColumnarToRow : +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> +- ReusedExchange [new_store_id#3997], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#262] ``` This is because `OptimizeSubqueries` will infer more filters, so we cannot reuse broadcasts. The following is the plan if disable `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly`: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(store_id#4001 IN subquery#4009)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> : +- Subquery subquery#4009, [id=#284] : +- *(2) HashAggregate(keys=[new_store_id#3997#4008], functions=[]) : +- Exchange hashpartitioning(new_store_id#3997#4008, 5), ENSURE_REQUIREMENTS, [id=#280] : +- *(1) HashAggregate(keys=[new_store_id#3997 AS new_store_id#3997#4008], functions=[]) : +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] : +- *(1) Filter (((isnotnull(store_id#4002) AND isnotnull(country#4004)) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) : +- *(1) ColumnarToRow : +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(store_id#4002), isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002..., Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(store_id), IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#305] +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) +- *(1) ColumnarToRow +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> ``` ### Why are the changes needed? Improve DPP to support more cases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test and benchmark test: SQL | Before this PR(Seconds) | After this PR(Seconds) -- | -- | -- TPC-DS q58 | 40 | 20 TPC-DS q83 | 18 | 14 Closes #33664 from wangyum/SPARK-36444. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Yuming Wang <[email protected]>

…runing ### What changes were proposed in this pull request? Remove `OptimizeSubqueries` from batch of `PartitionPruning` to make DPP support more cases. For example: ```sql SELECT date_id, product_id FROM fact_sk f JOIN (select store_id + 3 as new_store_id from dim_store where country = 'US') s ON f.store_id = s.new_store_id ``` Before this PR: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(true)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#274] +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) +- *(1) ColumnarToRow +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> ``` After this PR: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(store_id#4001 IN dynamicpruning#4007)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> : +- SubqueryBroadcast dynamicpruning#4007, 0, [new_store_id#3997], [id=#263] : +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#262] : +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] : +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) : +- *(1) ColumnarToRow : +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> +- ReusedExchange [new_store_id#3997], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#262] ``` This is because `OptimizeSubqueries` will infer more filters, so we cannot reuse broadcasts. The following is the plan if disable `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly`: ``` == Physical Plan == *(2) Project [date_id#3998, product_id#3999] +- *(2) BroadcastHashJoin [store_id#4001], [new_store_id#3997], Inner, BuildRight, false :- *(2) ColumnarToRow : +- FileScan parquet default.fact_sk[date_id#3998,product_id#3999,store_id#4001] Batched: true, DataFilters: [], Format: Parquet, PartitionFilters: [isnotnull(store_id#4001), dynamicpruningexpression(store_id#4001 IN subquery#4009)], PushedFilters: [], ReadSchema: struct<date_id:int,product_id:int> : +- Subquery subquery#4009, [id=#284] : +- *(2) HashAggregate(keys=[new_store_id#3997#4008], functions=[]) : +- Exchange hashpartitioning(new_store_id#3997#4008, 5), ENSURE_REQUIREMENTS, [id=#280] : +- *(1) HashAggregate(keys=[new_store_id#3997 AS new_store_id#3997#4008], functions=[]) : +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] : +- *(1) Filter (((isnotnull(store_id#4002) AND isnotnull(country#4004)) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) : +- *(1) ColumnarToRow : +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(store_id#4002), isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002..., Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(store_id), IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#305] +- *(1) Project [(store_id#4002 + 3) AS new_store_id#3997] +- *(1) Filter ((isnotnull(country#4004) AND (country#4004 = US)) AND isnotnull((store_id#4002 + 3))) +- *(1) ColumnarToRow +- FileScan parquet default.dim_store[store_id#4002,country#4004] Batched: true, DataFilters: [isnotnull(country#4004), (country#4004 = US), isnotnull((store_id#4002 + 3))], Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(country), EqualTo(country,US)], ReadSchema: struct<store_id:int,country:string> ``` ### Why are the changes needed? Improve DPP to support more cases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test and benchmark test: SQL | Before this PR(Seconds) | After this PR(Seconds) -- | -- | -- TPC-DS q58 | 40 | 20 TPC-DS q83 | 18 | 14 Closes #33664 from wangyum/SPARK-36444. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Yuming Wang <[email protected]> (cherry picked from commit 2310b99) Signed-off-by: Yuming Wang <[email protected]>

Reducing the ouput of run-tests script.

811170f

pwendell reviewed Mar 29, 2014
View reviewed changes

Further reduction in noise and made pyspark tests to fail fast.

87dfa54

asfgit closed this in df1b9f7 Mar 30, 2014

marmbrus mentioned this pull request Apr 3, 2014

Fix jenkins from giving the green light to builds that don't compile. #317

Closed

ScrapCodes deleted the SPARK-1336/ReduceVerbosity branch June 3, 2015 06:01

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#262 from animationzl/add-bosh-cpi-job

84a9ac1

Add job bosh-openstack-cpi-release-acceptance-test

peter-toth mentioned this pull request Jun 21, 2020

[SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse #28885

Closed

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-179] Spark Integration for Kafka 0.9 unit tests fixed (ap…

8caa9da

…ache#262)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1336 Reducing the output of run-tests script. #262

SPARK-1336 Reducing the output of run-tests script. #262

ScrapCodes commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

pwendell Mar 29, 2014

pwendell Mar 29, 2014

pwendell commented Mar 29, 2014

ScrapCodes commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

pwendell commented Mar 30, 2014

marmbrus commented Apr 3, 2014

SPARK-1336 Reducing the output of run-tests script. #262

SPARK-1336 Reducing the output of run-tests script. #262

Conversation

ScrapCodes commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

pwendell Mar 29, 2014

Choose a reason for hiding this comment

pwendell Mar 29, 2014

Choose a reason for hiding this comment

pwendell commented Mar 29, 2014

ScrapCodes commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

AmplabJenkins commented Mar 29, 2014

pwendell commented Mar 30, 2014

marmbrus commented Apr 3, 2014