Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17992][SQL] Return all partitions from HiveShim when Hive throws a metastore exception when attempting to fetch partitions by filter #15673

Conversation

mallman
Copy link
Contributor

@mallman mallman commented Oct 28, 2016

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17992)

What changes were proposed in this pull request?

We recently added table partition pruning for partitioned Hive tables converted to using TableFileCatalog. When the Hive configuration option hive.metastore.try.direct.sql is set to false, Hive will throw an exception for unsupported filter expressions. For example, attempting to filter on an integer partition column will throw a org.apache.hadoop.hive.metastore.api.MetaException.

I discovered this behavior because VideoAmp uses the CDH version of Hive with a Postgresql metastore DB. In this configuration, CDH sets hive.metastore.try.direct.sql to false by default, and queries that filter on a non-string partition column will fail.

Rather than throw an exception in query planning, this patch catches this exception, logs a warning and returns all table partitions instead. Clients of this method are already expected to handle the possibility that the filters will not be honored.

How was this patch tested?

A unit test was added.

@SparkQA
Copy link

SparkQA commented Oct 28, 2016

Test build #67713 has finished for PR 15673 at commit c62beda.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.asInstanceOf[JArrayList[Partition]]
} catch {
case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] =>
logWarning("Caught MetaException attempting to get partitions by filter from Hive", ex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the msg to say we are falling back to fetch all partitions' medatadata?

@rxin
Copy link
Contributor

rxin commented Oct 29, 2016

cc @ericl

@ericl
Copy link
Contributor

ericl commented Oct 29, 2016

Could we enable this fallback only when the conf is set to false? Otherwise, it might mask legitimate bugs.

I also wonder if some of our flaky tests around this issue are due to the conf being leaked by some suites...

@mallman
Copy link
Contributor Author

mallman commented Oct 29, 2016

Could we enable this fallback only when the conf is set to false? Otherwise, it might mask legitimate bugs.

Certainly, but my intent with this PR is to prevent a (painful and confusing) regression for some Hive users of Spark 2.1 which can occur, because Spark 2.1 enables our new partition pruning implementation by default. I mentioned one case where this will happen, but we can't be sure this is the only case. If we make the conditions under which we use a fallback too narrow, we are assuming that other configurations of Hive are compatible with partition pruning outside of the specific conditions we check. I think that's a bit too risky. In fact, before submitting this PR I had written the catch block to catch and fall back for all types of Exception. What I ended up with here is a middle ground.

@mallman
Copy link
Contributor Author

mallman commented Oct 29, 2016

The current merge conflict is from d2d438d, which touches the same code. I'll wait for that to be settled before rebasing.

@SparkQA
Copy link

SparkQA commented Oct 29, 2016

Test build #67772 has finished for PR 15673 at commit 887e9b1.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@ericl
Copy link
Contributor

ericl commented Oct 29, 2016

For large tables, the degraded performance should be considered a bug as well.

How about this.

  1. If direct sql is disabled, log a warning about degraded performance with this flag and fall back to fetching all partitions.
  2. If direct sql is enabled, crash with a message suggesting to disable filesource partition management and report a bug.

That way, we will know if there are cases where metastore pruning fails with direct sql enabled.

@mallman
Copy link
Contributor Author

mallman commented Oct 31, 2016

@ericl I've pushed a commit with the changes you recommended.

@SparkQA
Copy link

SparkQA commented Oct 31, 2016

Test build #67834 has finished for PR 15673 at commit 4c438c8.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@mallman
Copy link
Contributor Author

mallman commented Oct 31, 2016

It looks like all the unit tests passed, however one of the forked test java processes exited with nonzero status for some unknown reason.

case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] &&
tryDirectSql =>
throw new RuntimeException("Caught Hive MetaException attempting to get partition " +
"metadata by filter from Hive. Set the Spark configuration setting " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want word it to suggest disabling partition management as a workaround only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some revisions. LMK what you think.

@ericl
Copy link
Contributor

ericl commented Nov 1, 2016

This looks good to me. cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #3382 has finished for PR 15673 at commit 4c438c8.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67859 has finished for PR 15673 at commit 1ed3301.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 1, 2016

@mallman can you bring this up-to-date?

@mallman
Copy link
Contributor Author

mallman commented Nov 1, 2016

@rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need to be resolved before I can rebase this PR.

@ericl
Copy link
Contributor

ericl commented Nov 1, 2016

@mallman shall we go ahead and revert that in this PR? It didn't help with debugging the flaky test much.

@mallman
Copy link
Contributor Author

mallman commented Nov 1, 2016

@ericl I can do that, yes. I'm current tied down. I will push a new commit later today or tonight.

@mallman mallman force-pushed the spark-17992-catch_hive_partition_filter_exception branch from 1ed3301 to 8d468ac Compare November 2, 2016 03:40
@mallman
Copy link
Contributor Author

mallman commented Nov 2, 2016

Rebased.

@SparkQA
Copy link

SparkQA commented Nov 2, 2016

Test build #67949 has finished for PR 15673 at commit 8d468ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

Merging in master. Thanks.

@asfgit asfgit closed this in 1bbf9ff Nov 2, 2016
@mallman
Copy link
Contributor Author

mallman commented Nov 2, 2016

Happy to help.

@mallman mallman deleted the spark-17992-catch_hive_partition_filter_exception branch November 2, 2016 15:42
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ws a metastore exception when attempting to fetch partitions by filter

(Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-17992)
## What changes were proposed in this pull request?

We recently added table partition pruning for partitioned Hive tables converted to using `TableFileCatalog`. When the Hive configuration option `hive.metastore.try.direct.sql` is set to `false`, Hive will throw an exception for unsupported filter expressions. For example, attempting to filter on an integer partition column will throw a `org.apache.hadoop.hive.metastore.api.MetaException`.

I discovered this behavior because VideoAmp uses the CDH version of Hive with a Postgresql metastore DB. In this configuration, CDH sets `hive.metastore.try.direct.sql` to `false` by default, and queries that filter on a non-string partition column will fail.

Rather than throw an exception in query planning, this patch catches this exception, logs a warning and returns all table partitions instead. Clients of this method are already expected to handle the possibility that the filters will not be honored.
## How was this patch tested?

A unit test was added.

Author: Michael Allman <[email protected]>

Closes apache#15673 from mallman/spark-17992-catch_hive_partition_filter_exception.
getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]]
case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] &&
tryDirectSql =>
throw new RuntimeException("Caught Hive MetaException attempting to get partition " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mallman sorry to disturb you here, but what is the reason that when direct sql isn't set only a warning is logged?and why when direct sql is set a runtime exception is being raised instead of just a warning like no direct sql case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rezasafi

I believe the reasoning is if the user has disabled direct sql, we will try to fetch the partitions for the requested partition predicate anyway. However, since we don't expect that call to succeed, we just log a warning and fallback to the legacy behavior.

On the other hand, if the user has enabled direct sql, then we expect the call to Hive to succeed. If it fails, we consider that an error and throw an exception.

I hope that helps clarify things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the explanation @mallman. I appreciate it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mallman Your assumption is incorrect. If Hive on direct sql fails, it will retry with ORM. So in this case, I am able to reproduce a issue with postgres where direct sql fails and if it retries with ORM, spark fails! Hive has fallback behavior for direct sql.

Filed SPARK-25561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants