Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE #23905

Closed
wants to merge 1 commit into from

Conversation

Udbhav30
Copy link
Contributor

What changes were proposed in this pull request?

Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

How was this patch tested?

UT is added

Copy link
Contributor

@dilipbiswal dilipbiswal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the call to drop the database, we can get an error. For example, its not allowed to drop the default database. In that case, even though we are not going to go ahead with the drop database action, we will end up refreshing all the tables inside it ? Is that expected ?

@Udbhav30
Copy link
Contributor Author

On the call to drop the database, we can get an error. For example, its not allowed to drop the default database. In that case, even though we are not going to go ahead with the drop database action, we will end up refreshing all the tables inside it ? Is that expected ?

modified

@Udbhav30
Copy link
Contributor Author

CC @HyukjinKwon

catalog.listTables(databaseName).foreach { t =>
catalog.refreshTable(t)
}
}
sparkSession.sessionState.catalog.dropDatabase(databaseName, ifExists, cascade)
Copy link
Contributor

@dilipbiswal dilipbiswal Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic move under catalog.dropDatabase function ? The reason is, we have the following code in dropDatabase.

 val dbName = formatDatabaseName(db)
    if (dbName == DEFAULT_DATABASE) {
      throw new AnalysisException(s"Can not drop default database")
    }

If we keep it here, then we should have identical check. In this case formatDatabase().. handles case sensitivity in the database name. Also we should use DEFAULT_DATABASE as opposed to hard coding it here.

2ndly, can any of the calls 1) listTables 2)refreshTable throw exception ? In this case, do we want to fail the drop operation or allow the drop with a warning ? I don't know the answer. Just list the questions here so others can help answer.

cc @cloud-fan @gatorsmile for their input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i have moved that code to catalog.dropDatabase functions, i also think that make more sense

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd a bit worry about this approach. shouldn't this be done up to the user and not automatically?

@Udbhav30
Copy link
Contributor Author

I'd a bit worry about this approach. shouldn't this be done up to the user and not automatically?

Can we do it on the behalf of user as we are following this approach while using dropTable and refresh the table so these changes will also make it inline with dropTable behaviour.

@Udbhav30
Copy link
Contributor Author

Udbhav30 commented Mar 1, 2019

cc @gatorsmile @dongjoon-hyun can you please review this.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @Udbhav30 .

I left a few comment. Also, could you move this test case to DDLSuite.scala? If you put it there, InMemoryCatalogedDDLSuite and HiveCatalogedDDLSuite will test it.

@Udbhav30 Udbhav30 force-pushed the SPARK-24669 branch 2 times, most recently from 6accf35 to 188ae92 Compare March 1, 2019 07:38
@Udbhav30
Copy link
Contributor Author

Udbhav30 commented Mar 1, 2019

@dongjoon-hyun I have done the suggested changes

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Mar 1, 2019

Test build #102917 has finished for PR 23905 at commit 188ae92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 1, 2019

@Udbhav30 . Two test case failures look relevant to this PR. Could you check them by running locally? Please update this PR after you pass them locally. Thanks.

@Udbhav30
Copy link
Contributor Author

Udbhav30 commented Mar 4, 2019

yes i will check and update

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #102995 has finished for PR 23905 at commit 2c8c2c5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 4, 2019

Test build #103001 has finished for PR 23905 at commit 3eb82de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Udbhav30
Copy link
Contributor Author

Udbhav30 commented Mar 5, 2019

@dongjoon-hyun i have handled all suggestions please let me know if anymore suggestions are there
thanks

@@ -218,6 +218,11 @@ class SessionCatalog(
if (dbName == DEFAULT_DATABASE) {
throw new AnalysisException(s"Can not drop default database")
}
if (cascade && databaseExists(dbName)) {
listTables(dbName).foreach { t =>
refreshTable(t)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is invalidating the tables. So, let's use more narrow API here instead of refreshTable.

-        refreshTable(t)
+        invalidateCachedTable(QualifiedTableName(dbName, t.table))

@@ -218,6 +218,11 @@ class SessionCatalog(
if (dbName == DEFAULT_DATABASE) {
throw new AnalysisException(s"Can not drop default database")
}
if (cascade && databaseExists(dbName)) {
listTables(dbName).foreach { t =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Do we need to worry about any recurring exceptions from this call like following ?
http://discuss.itversity.com/t/unable-to-perform-listtables-on-spark-catalog-class/15888
Should we fail the drop database or warn and proceed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best behavior is compatible with the old Spark behavior. So, that will be warn and proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dongjoon-hyun

@Udbhav30 Udbhav30 force-pushed the SPARK-24669 branch 2 times, most recently from c4e3214 to eab40d6 Compare March 6, 2019 06:49
@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103082 has finished for PR 23905 at commit eab40d6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103083 has finished for PR 23905 at commit e653040.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103081 has finished for PR 23905 at commit c4e3214.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 6, 2019

Test build #103093 has finished for PR 23905 at commit e653040.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-24669][SQL] Refresh table before drop database cascade [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE Mar 6, 2019
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @Udbhav30 . And thank you, @SongYadong , @dilipbiswal , @felixcheung , @cloud-fan .

+1, LGTM, too. Merged to master/2.4/2.3.

For the MetaException, we can ignore for now since it seems that Spark don't handle that except partitioning handling cases. We can revisit that if needed later.

dongjoon-hyun pushed a commit that referenced this pull request Mar 6, 2019
  ## What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

UT is added

Closes #23905 from Udbhav30/SPARK-24669.

Authored-by: Udbhav30 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9bddf71)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Mar 6, 2019
  ## What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

UT is added

Closes #23905 from Udbhav30/SPARK-24669.

Authored-by: Udbhav30 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9bddf71)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
  ## What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

UT is added

Closes apache#23905 from Udbhav30/SPARK-24669.

Authored-by: Udbhav30 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9bddf71)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
  ## What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

UT is added

Closes apache#23905 from Udbhav30/SPARK-24669.

Authored-by: Udbhav30 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9bddf71)
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
  ## What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.

UT is added

Closes apache#23905 from Udbhav30/SPARK-24669.

Authored-by: Udbhav30 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9bddf71)
Signed-off-by: Dongjoon Hyun <[email protected]>
@Udbhav30 Udbhav30 deleted the SPARK-24669 branch January 23, 2020 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants