Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48236][BUILD] Add
commons-lang:commons-lang:2.6
back to supp…
…ort legacy Hive UDF jars ### What changes were proposed in this pull request? This PR aims to add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars . This is a partial revert of SPARK-47018 . ### Why are the changes needed? Recently, we dropped `commons-lang:commons-lang` during Hive upgrade. - #46468 However, only Apache Hive 2.3.10 or 4.0.0 dropped it. In other words, Hive 2.0.0 ~ 2.3.9 and Hive 3.0.0 ~ 3.1.3 requires it. As a result, all existing UDF jars built against those versions requires `commons-lang:commons-lang` still. - apache/hive#4892 For example, Apache Hive 3.1.3 code: - https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java#L21 ``` import org.apache.commons.lang.StringUtils; ``` - https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java#L42 ``` return StringUtils.strip(val, " "); ``` As a result, Maven CIs are broken. - https://github.com/apache/spark/actions/runs/9032639456/job/24825599546 (Maven / Java 17) - https://github.com/apache/spark/actions/runs/9033374547/job/24835284769 (Maven / Java 21) The root cause is that the existing test UDF jar `hive-test-udfs.jar` was built from old Hive (before 2.3.10) libraries which requires `commons-lang:commons-lang:2.6`. ``` HiveUDFDynamicLoadSuite: - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDF 20:21:25.129 WARN org.apache.spark.SparkContext: The JAR file:///home/runner/work/spark/spark/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:33327/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. *** RUN ABORTED *** A needed class was not found. This could be due to an error in your runpath. Missing class: org/apache/commons/lang/StringUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:184) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:164) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185) ... Cause: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) ... ``` ### Does this PR introduce _any_ user-facing change? To support the existing customer UDF jars. ### How was this patch tested? Manually. ``` $ build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite test ... HiveUDFDynamicLoadSuite: 14:21:56.034 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 14:21:56.035 WARN org.apache.hadoop.hive.metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore dongjoon127.0.0.1 14:21:56.041 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDF 14:21:57.576 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDF 14:21:58.314 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDAF 14:21:58.943 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDAF 14:21:59.333 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 14:21:59.364 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist 14:21:59.370 WARN org.apache.hadoop.hive.metastore.HiveMetaStore: Location: file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src specified for non-external table:src 14:21:59.718 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException 14:21:59.770 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDTF 14:22:00.403 WARN org.apache.hadoop.hive.common.FileUtils: File file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src does not exist; Force to delete it. 14:22:00.404 ERROR org.apache.hadoop.hive.common.FileUtils: Failed to delete file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src 14:22:00.441 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist 14:22:00.453 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory. 14:22:00.537 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist Run completed in 8 seconds, 612 milliseconds. Total number of tests run: 5 Suites: completed 2, aborted 0 Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? Closes #46528 from dongjoon-hyun/SPARK-48236. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information