Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

Open
1 task done
johnboyer opened this issue Oct 30, 2023 · 2 comments
Open
1 task done

[BUG] Spark Excel is Incompatible with AWS EMR v6.13 and higher #802

johnboyer opened this issue Oct 30, 2023 · 2 comments

Comments

@johnboyer
Copy link

johnboyer commented Oct 30, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Problem

The latest release of spark-excel_2.12:3.4.1_0.20.1 is incompatible with the latest releases of AWS EMR v6.13 and higher because they're built on Scala 2.12.15.

We use JDK 17 with Spark 3.4.1 and Scala 2.12.15 so our Java applications can run in the latest EMR versions. However, with the spark-excel_2.12:3.4.1_0.20.1 dependency included we consistently get the following runtime error in the EMR logs:

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.Map$Map2 to field org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter.filterParams of type scala.collection.immutable.Map in instance of org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$AddWebUIFilter

We've tried all machinations to no avail, including getting support from AWS. Can you provide support for Scala 2.12.15 for backward compatibility with Spark 3.4.1 in the EMR?

References:

Expected Behavior

We'd prefer backward compatibility with Scala 2.12.15 so we can continue using Spark Excel in the AWS EMR v6.13+.

Steps To Reproduce

  1. Create a simple Java application that uses Spark 3.4.1 and pin its Scala dependencies to 2.12.15.
  2. Then add the spark-excel_2.12:3.4.1_0.20.1
  3. Write some code that reads the file into a DataSet
  4. Deploy the app to the EMR and run it

Environment

- Spark version: 3.4.1
- Spark-Excel version: spark-excel_2.12:3.4.1_0.20.1
- OS: Amazon Linux release 2.0.20231012.1
- Cluster environment: EMR v6.13.0

Anything else?

No response

@nightscape
Copy link
Owner

There was a similar issue in Hudi.
@johnboyer have you tried building spark-excel with 2.12.15? Does it fix the problem?

@johnboyer
Copy link
Author

Hi @nightscape: We tried excluding scala libraries and other dependency machinations, but it never solved the binary incompatibility problem. We're a Java shop with no Scala experience, so we could not figure out how to build the library. If you give us step-by-step instructions, we can try it. In the meantime, we migrated our code to fastexcel, a non-spark pure Java library. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants