[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

johnboyer · 2023-10-30T22:40:32Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Problem

The latest release of spark-excel_2.12:3.4.1_0.20.1 is incompatible with the latest releases of AWS EMR v6.13 and higher because they're built on Scala 2.12.15.

We use JDK 17 with Spark 3.4.1 and Scala 2.12.15 so our Java applications can run in the latest EMR versions. However, with the spark-excel_2.12:3.4.1_0.20.1 dependency included we consistently get the following runtime error in the EMR logs:

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.Map$Map2 to field org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter.filterParams of type scala.collection.immutable.Map in instance of org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter

We've tried all machinations to no avail, including getting support from AWS. Can you provide support for Scala 2.12.15 for backward compatibility with Spark 3.4.1 in the EMR?

References:

Expected Behavior

We'd prefer backward compatibility with Scala 2.12.15 so we can continue using Spark Excel in the AWS EMR v6.13+.

Steps To Reproduce

Create a simple Java application that uses Spark 3.4.1 and pin its Scala dependencies to 2.12.15.
Then add the spark-excel_2.12:3.4.1_0.20.1
Write some code that reads the file into a DataSet
Deploy the app to the EMR and run it

Environment

- Spark version: 3.4.1
- Spark-Excel version: spark-excel_2.12:3.4.1_0.20.1
- OS: Amazon Linux release 2.0.20231012.1
- Cluster environment: EMR v6.13.0

Anything else?

No response

The text was updated successfully, but these errors were encountered:

nightscape · 2023-11-14T10:15:25Z

There was a similar issue in Hudi.
@johnboyer have you tried building spark-excel with 2.12.15? Does it fix the problem?

johnboyer · 2023-11-16T18:50:29Z

Hi @nightscape: We tried excluding scala libraries and other dependency machinations, but it never solved the binary incompatibility problem. We're a Java shop with no Scala experience, so we could not figure out how to build the library. If you give us step-by-step instructions, we can try it. In the meantime, we migrated our code to fastexcel, a non-spark pure Java library. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

johnboyer commented Oct 30, 2023 •

edited

Loading

nightscape commented Nov 14, 2023

johnboyer commented Nov 16, 2023

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

Comments

johnboyer commented Oct 30, 2023 • edited Loading

Is there an existing issue for this?

Current Behavior

Problem

References:

Expected Behavior

Steps To Reproduce

Environment

Anything else?

nightscape commented Nov 14, 2023

johnboyer commented Nov 16, 2023

johnboyer commented Oct 30, 2023 •

edited

Loading