Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatible with spark 3.4.3 and 3.5.0 #409

Open
wuxiaocheng0506 opened this issue Jun 4, 2024 · 4 comments
Open

incompatible with spark 3.4.3 and 3.5.0 #409

wuxiaocheng0506 opened this issue Jun 4, 2024 · 4 comments

Comments

@wuxiaocheng0506
Copy link

wuxiaocheng0506 commented Jun 4, 2024

I use the lastest nightly version 1.7.0b20240501.dev0.

It works init spark and read data to spark dataframe, but when run ray.data.from_spark(df) :

it blocked when using spark 3.4.3.
and when using spark 3.5.0, raise excepction:

Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.util.ArrowUtils$.toArrowSchema(Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)Lorg/apache/arrow/vector/types/pojo/Schema;
at org.apache.spark.sql.raydp.ObjectStoreWriter.$anonfun$save$1(ObjectStoreWriter.scala:108)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
... 1 more

@pang-wu
Copy link
Contributor

pang-wu commented Jun 16, 2024

This PR #411 should fix your problem

@pang-wu
Copy link
Contributor

pang-wu commented Jun 19, 2024

@wuxiaocheng0506 mind try the nightly build again?

@wuxiaocheng0506
Copy link
Author

Test passed with pyspark 3.5.0 and 3.4.3, thanks. @pang-wu

@pang-wu
Copy link
Contributor

pang-wu commented Jul 3, 2024

@wuxiaocheng0506 mind close this issue then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants