[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

mallman · 2016-10-28T17:49:47Z

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17992)

What changes were proposed in this pull request?

We recently added table partition pruning for partitioned Hive tables converted to using TableFileCatalog. When the Hive configuration option hive.metastore.try.direct.sql is set to false, Hive will throw an exception for unsupported filter expressions. For example, attempting to filter on an integer partition column will throw a org.apache.hadoop.hive.metastore.api.MetaException.

I discovered this behavior because VideoAmp uses the CDH version of Hive with a Postgresql metastore DB. In this configuration, CDH sets hive.metastore.try.direct.sql to false by default, and queries that filter on a non-string partition column will fail.

Rather than throw an exception in query planning, this patch catches this exception, logs a warning and returns all table partitions instead. Clients of this method are already expected to handle the possibility that the filters will not be honored.

How was this patch tested?

A unit test was added.

SparkQA · 2016-10-28T19:19:52Z

Test build #67713 has finished for PR 15673 at commit c62beda.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-29T04:49:39Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+            .asInstanceOf[JArrayList[Partition]]
+        } catch {
+          case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] =>
+            logWarning("Caught MetaException attempting to get partitions by filter from Hive", ex)


change the msg to say we are falling back to fetch all partitions' medatadata?

rxin · 2016-10-29T04:49:56Z

cc @ericl

ericl · 2016-10-29T18:16:26Z

Could we enable this fallback only when the conf is set to false? Otherwise, it might mask legitimate bugs.

I also wonder if some of our flaky tests around this issue are due to the conf being leaked by some suites...

mallman · 2016-10-29T19:54:08Z

Could we enable this fallback only when the conf is set to false? Otherwise, it might mask legitimate bugs.

Certainly, but my intent with this PR is to prevent a (painful and confusing) regression for some Hive users of Spark 2.1 which can occur, because Spark 2.1 enables our new partition pruning implementation by default. I mentioned one case where this will happen, but we can't be sure this is the only case. If we make the conditions under which we use a fallback too narrow, we are assuming that other configurations of Hive are compatible with partition pruning outside of the specific conditions we check. I think that's a bit too risky. In fact, before submitting this PR I had written the catch block to catch and fall back for all types of Exception. What I ended up with here is a middle ground.

mallman · 2016-10-29T20:02:46Z

The current merge conflict is from d2d438d, which touches the same code. I'll wait for that to be settled before rebasing.

SparkQA · 2016-10-29T22:10:43Z

Test build #67772 has finished for PR 15673 at commit 887e9b1.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

ericl · 2016-10-29T22:51:05Z

For large tables, the degraded performance should be considered a bug as well.

How about this.

If direct sql is disabled, log a warning about degraded performance with this flag and fall back to fetching all partitions.
If direct sql is enabled, crash with a message suggesting to disable filesource partition management and report a bug.

That way, we will know if there are cases where metastore pruning fails with direct sql enabled.

mallman · 2016-10-31T20:44:26Z

@ericl I've pushed a commit with the changes you recommended.

SparkQA · 2016-10-31T22:51:24Z

Test build #67834 has finished for PR 15673 at commit 4c438c8.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

mallman · 2016-10-31T23:00:15Z

It looks like all the unit tests passed, however one of the forked test java processes exited with nonzero status for some unknown reason.

ericl · 2016-10-31T23:54:12Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+          case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] &&
+              tryDirectSql =>
+            throw new RuntimeException("Caught Hive MetaException attempting to get partition " +
+              "metadata by filter from Hive. Set the Spark configuration setting " +


You probably want word it to suggest disabling partition management as a workaround only.

Good point.

I made some revisions. LMK what you think.

ericl · 2016-11-01T01:50:59Z

This looks good to me. cc @cloud-fan

SparkQA · 2016-11-01T03:10:00Z

Test build #3382 has finished for PR 15673 at commit 4c438c8.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

cloud-fan · 2016-11-01T03:38:12Z

LGTM

SparkQA · 2016-11-01T04:33:09Z

Test build #67859 has finished for PR 15673 at commit 1ed3301.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

rxin · 2016-11-01T04:38:11Z

@mallman can you bring this up-to-date?

mallman · 2016-11-01T05:15:44Z

@rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need to be resolved before I can rebase this PR.

ericl · 2016-11-01T21:13:37Z

@mallman shall we go ahead and revert that in this PR? It didn't help with debugging the flaky test much.

mallman · 2016-11-01T22:40:03Z

@ericl I can do that, yes. I'm current tied down. I will push a new commit later today or tonight.

a metastore exception when attempting to fetch partitions by filter

all table partitions

fails and Hive's direct SQL is enabled

mallman · 2016-11-02T03:40:55Z

Rebased.

SparkQA · 2016-11-02T05:14:43Z

Test build #67949 has finished for PR 15673 at commit 8d468ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-11-02T05:19:30Z

Merging in master. Thanks.

mallman · 2016-11-02T15:20:03Z

Happy to help.

…ws a metastore exception when attempting to fetch partitions by filter (Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17992) ## What changes were proposed in this pull request? We recently added table partition pruning for partitioned Hive tables converted to using `TableFileCatalog`. When the Hive configuration option `hive.metastore.try.direct.sql` is set to `false`, Hive will throw an exception for unsupported filter expressions. For example, attempting to filter on an integer partition column will throw a `org.apache.hadoop.hive.metastore.api.MetaException`. I discovered this behavior because VideoAmp uses the CDH version of Hive with a Postgresql metastore DB. In this configuration, CDH sets `hive.metastore.try.direct.sql` to `false` by default, and queries that filter on a non-string partition column will fail. Rather than throw an exception in query planning, this patch catches this exception, logs a warning and returns all table partitions instead. Clients of this method are already expected to handle the possibility that the filters will not be honored. ## How was this patch tested? A unit test was added. Author: Michael Allman <[email protected]> Closes apache#15673 from mallman/spark-17992-catch_hive_partition_filter_exception.

rezasafi · 2018-08-20T18:53:23Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala

+            getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]]
+          case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] &&
+              tryDirectSql =>
+            throw new RuntimeException("Caught Hive MetaException attempting to get partition " +


@mallman sorry to disturb you here, but what is the reason that when direct sql isn't set only a warning is logged?and why when direct sql is set a runtime exception is being raised instead of just a warning like no direct sql case?

Hi @rezasafi

I believe the reasoning is if the user has disabled direct sql, we will try to fetch the partitions for the requested partition predicate anyway. However, since we don't expect that call to succeed, we just log a warning and fallback to the legacy behavior.

On the other hand, if the user has enabled direct sql, then we expect the call to Hive to succeed. If it fails, we consider that an error and throw an exception.

I hope that helps clarify things.

Thank you very much for the explanation @mallman. I appreciate it.

@mallman Your assumption is incorrect. If Hive on direct sql fails, it will retry with ORM. So in this case, I am able to reproduce a issue with postgres where direct sql fails and if it retries with ORM, spark fails! Hive has fallback behavior for direct sql.

Filed SPARK-25561

rxin reviewed Oct 29, 2016

View reviewed changes

ericl reviewed Nov 1, 2016

View reviewed changes

Michael Allman added 4 commits November 1, 2016 20:39

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws

212b416

a metastore exception when attempting to fetch partitions by filter

Update the warning message to clarify that we will fall back to fetching

f5d0186

all table partitions

Throw an exception with a specific message if getPartitionsByFilter

4de1596

fails and Hive's direct SQL is enabled

Revise the warning/error messages

8d468ac

mallman force-pushed the spark-17992-catch_hive_partition_filter_exception branch from 1ed3301 to 8d468ac Compare November 2, 2016 03:40

asfgit closed this in 1bbf9ff Nov 2, 2016

mallman deleted the spark-17992-catch_hive_partition_filter_exception branch November 2, 2016 15:42

rezasafi reviewed Aug 20, 2018

View reviewed changes

kmanamcheri mentioned this pull request Oct 8, 2018

[SPARK-25561][SQL] Implement a new config to control partition pruning fallback (if partition push-down to Hive fails) #22614

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

mallman commented Oct 28, 2016

SparkQA commented Oct 28, 2016

rxin Oct 29, 2016

rxin commented Oct 29, 2016

ericl commented Oct 29, 2016

mallman commented Oct 29, 2016

mallman commented Oct 29, 2016

SparkQA commented Oct 29, 2016

ericl commented Oct 29, 2016

mallman commented Oct 31, 2016

SparkQA commented Oct 31, 2016

mallman commented Oct 31, 2016

ericl Oct 31, 2016

mallman Nov 1, 2016

mallman Nov 1, 2016

ericl commented Nov 1, 2016

SparkQA commented Nov 1, 2016

cloud-fan commented Nov 1, 2016

SparkQA commented Nov 1, 2016

rxin commented Nov 1, 2016

mallman commented Nov 1, 2016

ericl commented Nov 1, 2016

mallman commented Nov 1, 2016

mallman commented Nov 2, 2016

SparkQA commented Nov 2, 2016

rxin commented Nov 2, 2016

mallman commented Nov 2, 2016

rezasafi Aug 20, 2018

mallman Sep 7, 2018

rezasafi Sep 7, 2018

kmanamcheri Sep 27, 2018

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

Conversation

mallman commented Oct 28, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Oct 28, 2016

Choose a reason for hiding this comment

rxin commented Oct 29, 2016

ericl commented Oct 29, 2016

mallman commented Oct 29, 2016

mallman commented Oct 29, 2016

SparkQA commented Oct 29, 2016

ericl commented Oct 29, 2016

mallman commented Oct 31, 2016

SparkQA commented Oct 31, 2016

mallman commented Oct 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl commented Nov 1, 2016

SparkQA commented Nov 1, 2016

cloud-fan commented Nov 1, 2016

SparkQA commented Nov 1, 2016

rxin commented Nov 1, 2016

mallman commented Nov 1, 2016

ericl commented Nov 1, 2016

mallman commented Nov 1, 2016

mallman commented Nov 2, 2016

SparkQA commented Nov 2, 2016

rxin commented Nov 2, 2016

mallman commented Nov 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment