Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache #31652

Closed
wants to merge 11 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Feb 25, 2021

What changes were proposed in this pull request?

This PR proposes the following:

Why are the changes needed?

Analyzing a plan should be done in the analysis phase if possible.

Not uncaching the view (existing behavior) seems like a bug since the cache may not be used again.

Does this PR introduce any user-facing change?

Yes, now the view can be uncached if it's already cached.

How was this patch tested?

Added new tests around uncaching.

The existing tests such as SQLViewSuite should cover the analysis changes.

@github-actions github-actions bot added the SQL label Feb 25, 2021
@imback82 imback82 changed the title [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase [WIP][SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase Feb 25, 2021
@imback82 imback82 marked this pull request as draft February 25, 2021 19:12
@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40055/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40055/

@SparkQA
Copy link

SparkQA commented Feb 25, 2021

Test build #135475 has finished for PR 31652 at commit ba766f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40421/

@SparkQA
Copy link

SparkQA commented Mar 7, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40421/

@imback82 imback82 changed the title [WIP][SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterView should invalidate the cache Mar 8, 2021
@imback82 imback82 marked this pull request as ready for review March 8, 2021 00:25
@@ -879,7 +879,7 @@ class Analyzer(override val catalogManager: CatalogManager)

private def unwrapRelationPlan(plan: LogicalPlan): LogicalPlan = {
EliminateSubqueryAliases(plan) match {
case v: View if v.isDataFrameTempView => v.child
case v: View if v.isTempViewStoringAnalyzedPlan => v.child
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed because this type of view can also be created not from a dataframe - e.g., ALTER VIEW ... AS with spark.sql.legacy.storeAnalyzedPlanForView set to true.

// If the plan cannot be analyzed, throw an exception and don't proceed.
val qe = session.sessionState.executePlan(query)
qe.assertAnalyzed()
val analyzedPlan = qe.analyzed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query will be an analyzed plan since AlterViewAs is converted to AlterViewAsCommand in Analyzer phase (ResolveSessionCatalog).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we remove the similar code in CreateViewCommand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot yet because of the following:

override lazy val planToCache: LogicalPlan = {
Dataset.ofRows(sparkSession,
CreateViewCommand(
name = TableIdentifier(tempViewName),
userSpecifiedColumns = Nil,
comment = None,
properties = Map.empty,
originalText = Some(originalText),
child = query,
allowExisting = false,

Here, query is not resolved yet. I will get to this though.

@@ -116,7 +116,7 @@ case class CreateViewCommand(
if (viewType == LocalTempView) {
val aliasedPlan = aliasPlan(sparkSession, analyzedPlan)
if (replace && needsToUncache(catalog.getRawTempView(name.table), aliasedPlan)) {
logInfo(s"Try to uncache ${name.quotedString} before replacing.")
logDebug(s"Try to uncache ${name.quotedString} before replacing.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to logDebug because the permanent view was using it:

logDebug(s"Try to uncache ${viewIdent.quotedString} before replacing.")

If logInfo is better, I can change these to logInfo instead.

checkCyclicViewReference(analyzedPlan, Seq(name), name)

logDebug(s"Try to uncache ${name.quotedString} before altering.")
CommandUtils.uncacheTableOrView(session, name.quotedString)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Uncaching seems like a right thing to do when you alter a view?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I think so, since the cached data is stale. It's similar to CREATE OR REPLACE TEMP VIEW, seems we have an extra check needsToUncache. Shall we do it here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the logic is very similar, I refactored this part to share code with CreateViewCommand.

@SparkQA
Copy link

SparkQA commented Mar 8, 2021

Test build #135849 has finished for PR 31652 at commit 51c66e8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterView should invalidate the cache [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache Mar 8, 2021
checkCyclicViewReference(analyzedPlan, Seq(name), name)

logDebug(s"Try to uncache ${name.quotedString} before altering.")
CommandUtils.uncacheTableOrView(session, name.quotedString)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the logic is very similar, I refactored this part to share code with CreateViewCommand.

rawTempView: LogicalPlan,
aliasedPlan: LogicalPlan): Boolean = rawTempView match {
// If TemporaryViewRelation doesn't store the analyzed view, always uncache.
case TemporaryViewRelation(_, None) => true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary because case p => !p.sameResult(aliasedPlan) should cover, but I added this case explicitly to make the intention clear.

needsToUncache(r, aliasedPlan)
}.getOrElse(false)
if (replace && uncache) {
logInfo(s"Try to uncache ${name.quotedString} before replacing.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logDebug?

* If `userSpecifiedColumns` is defined, alias the analyzed plan to the user specified columns,
* else return the analyzed plan directly.
*/
def aliasPlan(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we still keep this in CreateViewCommand and let it pass the aliasedPlan to createTemporaryViewRelation? This logic is not needed for AlterViewAsCommand.

// Do not need to uncache if the to-be-replaced temp view plan and the new plan are the
// same-result plans.
case TemporaryViewRelation(_, Some(p)) => !p.sameResult(aliasedPlan)
case p => !p.sameResult(aliasedPlan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need it? rawTempView should always be TemporaryViewRelation right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to first update CREATE TEMP VIEW USING:

val viewDefinition = Dataset.ofRows(
sparkSession, LogicalRelation(dataSource.resolveRelation())).logicalPlan
if (global) {
catalog.createGlobalTempView(tableIdent.table, viewDefinition, replace)
} else {
catalog.createTempView(tableIdent.table, viewDefinition, replace)
}

, which I will get to right after this PR.

Once the PR is done, we can update createTempView to take in TemporaryViewRelation (https://github.com/apache/spark/pull/31273/files#r580757641), then I can safely remove this line.

Let me actually create a JIRA to capture these as subtasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Mar 10, 2021

Test build #135921 has finished for PR 31652 at commit 9302be9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40530/

@SparkQA
Copy link

SparkQA commented Mar 10, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40530/

@SparkQA
Copy link

SparkQA commented Mar 11, 2021

Test build #135946 has finished for PR 31652 at commit 7e4815a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2a6e68e Mar 11, 2021
cloud-fan pushed a commit that referenced this pull request Mar 22, 2021
…d take/return more concrete types

### What changes were proposed in this pull request?

Now that all the temporary views are wrapped with `TemporaryViewRelation`(#31273, #31652, and #31825), this PR proposes to update `SessionCatalog`'s APIs for temporary views to take or return more concrete types.

APIs that will take `TemporaryViewRelation` instead of `LogicalPlan`:
```
createTempView, createGlobalTempView, alterTempViewDefinition
```

APIs that will return `TemporaryViewRelation` instead of `LogicalPlan`:
```
getRawTempView, getRawGlobalTempView
```

APIs that will return `View` instead of `LogicalPlan`:
```
getTempView, getGlobalTempView, lookupTempView
```

### Why are the changes needed?

Internal refactoring to work with more concrete types.

### Does this PR introduce _any_ user-facing change?

No, this is internal refactoring.

### How was this patch tested?

Updated existing tests affected by the refactoring.

Closes #31906 from imback82/use_temporary_view_relation.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants