Skip to content

Commit

Permalink
[SPARK-33770][SQL][TESTS] Fix the ALTER TABLE .. DROP PARTITION tes…
Browse files Browse the repository at this point in the history
…ts that delete files out of partition path

Modify the tests that add partitions with `LOCATION`, and where the number of nested folders in `LOCATION` doesn't match to the number of partitioned columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) folder out of the "base" path in `LOCATION`.

The problem belongs to Hive's MetaStore method `drop_partition_common`:
https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876
which tries to delete empty partition sub-folders recursively starting from the most deeper partition sub-folder up to the base folder. In the case when the number of sub-folder is not equal to the number of partitioned columns `part_vals.size()`, the method will try to list and delete folders out of the base path.

To fix test failures like apache#30643 (comment):
```
org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of partition values
sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
	at org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
...
Caused by: sbt.ForkMain$ForkError: org.apache.hadoop.hive.metastore.api.MetaException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381)
	at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source)
```

The issue can be reproduced by the following steps:
1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location`
2. Create a sub-folder in the base folder and drop permissions for it:
```
$ mkdir /Users/maximgekk/tmp/part-location/aaa
$ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa
$ ls -al /Users/maximgekk/tmp/part-location
total 0
drwxr-xr-x   3 maximgekk  staff    96 Dec 13 18:42 .
drwxr-xr-x  33 maximgekk  staff  1056 Dec 13 18:32 ..
d---------   2 maximgekk  staff    64 Dec 13 18:42 aaa
```
3. Create a table with a partition folder in the base folder:
```sql
spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int);
spark-sql> alter table tbl add partition (part0=1,part1=2) location '/Users/maximgekk/tmp/part-location/tbl';
```
4. Try to drop this partition:
```
spark-sql> alter table tbl drop partition (part0=1,part1=2);
20/12/13 18:46:07 ERROR HiveClientImpl:
======================
Attempt to drop the partition specs in table 'tbl' database 'default':
Map(part0 -> 1, part1 -> 2)
In this attempt, the following partitions have been dropped successfully:

The remaining partitions have not been dropped:
[1, 2]
======================

Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa;
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa;
```
The command fails because it tries to access to the sub-folder `aaa` that is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`.

No

By running the affected tests from local IDEA which does not have access to folders out of partition paths.

Closes apache#30752 from MaxGekk/fix-drop-partition-location.

Lead-authored-by: Max Gekk <[email protected]>
Co-authored-by: Maxim Gekk <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 9160d59)
Signed-off-by: Max Gekk <[email protected]>
  • Loading branch information
MaxGekk committed Dec 14, 2020
1 parent fee06ca commit 1ad11ae
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -408,8 +408,8 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
partitionColumnNames = Seq("partCol1", "partCol2"))
catalog.createTable(table, ignoreIfExists = false)

val newLocationPart1 = newUriForDatabase()
val newLocationPart2 = newUriForDatabase()
val newLocationPart1 = newUriForPartition(Seq("p1=1", "p2=2"))
val newLocationPart2 = newUriForPartition(Seq("p1=3", "p2=4"))

val partition1 =
CatalogTablePartition(Map("partCol1" -> "1", "partCol2" -> "2"),
Expand Down Expand Up @@ -991,6 +991,11 @@ abstract class CatalogTestUtils {

def newUriForDatabase(): URI = new URI(Utils.createTempDir().toURI.toString.stripSuffix("/"))

def newUriForPartition(parts: Seq[String]): URI = {
val path = parts.foldLeft(Utils.createTempDir())(new java.io.File(_, _))
new URI(path.toURI.toString.stripSuffix("/"))
}

def newDb(name: String): CatalogDatabase = {
CatalogDatabase(name, name + " description", newUriForDatabase(), Map.empty)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -983,12 +983,16 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
assert(fetched1.get.colStats.size == 2)

withTempPaths(numPaths = 2) { case Seq(dir1, dir2) =>
val file1 = new File(dir1 + "/data")
val partDir1 = new File(new File(dir1, "ds=2008-04-09"), "hr=11")
val file1 = new File(partDir1, "data")
file1.getParentFile.mkdirs()
Utils.tryWithResource(new PrintWriter(file1)) { writer =>
writer.write("1,a")
}

val file2 = new File(dir2 + "/data")
val partDir2 = new File(new File(dir2, "ds=2008-04-09"), "hr=12")
val file2 = new File(partDir2, "data")
file2.getParentFile.mkdirs()
Utils.tryWithResource(new PrintWriter(file2)) { writer =>
writer.write("1,a")
}
Expand All @@ -997,8 +1001,8 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
sql(
s"""
|ALTER TABLE $table ADD
|PARTITION (ds='2008-04-09', hr='11') LOCATION '${dir1.toURI.toString}'
|PARTITION (ds='2008-04-09', hr='12') LOCATION '${dir2.toURI.toString}'
|PARTITION (ds='2008-04-09', hr='11') LOCATION '${partDir1.toURI.toString}'
|PARTITION (ds='2008-04-09', hr='12') LOCATION '${partDir1.toURI.toString}'
""".stripMargin)
if (autoUpdate) {
val fetched2 = checkTableStats(table, hasSizeInBytes = true, expectedRowCounts = None)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -605,8 +605,8 @@ class HiveDDLSuite
val tab = "tab_with_partitions"
withTempDir { tmpDir =>
val basePath = new File(tmpDir.getCanonicalPath)
val part1Path = new File(basePath + "/part1")
val part2Path = new File(basePath + "/part2")
val part1Path = new File(new File(basePath, "part10"), "part11")
val part2Path = new File(new File(basePath, "part20"), "part21")
val dirSet = part1Path :: part2Path :: Nil

// Before data insertion, all the directory are empty
Expand Down

0 comments on commit 1ad11ae

Please sign in to comment.