Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala 2.13 support. #9099

Merged
merged 2 commits into from
May 27, 2023
Merged

Scala 2.13 support. #9099

merged 2 commits into from
May 27, 2023

Conversation

dotbg
Copy link
Contributor

@dotbg dotbg commented Apr 26, 2023

xgboost4j support of scala 2.13 #8650 and #6596

@dotbg dotbg marked this pull request as ready for review April 26, 2023 14:30
@trivialfis
Copy link
Member

Thank you for the work on updating the scala version. I will defer the review to @wbo4958 as I'm not familiar with the ecosystem.

Copy link
Contributor

@eejbyfeldt eejbyfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a review since I also looked into adding support for scala 2.13. But since I not a maintainer for this project I guess you have to take them for what they are.

Also if I run a compile on this branch

$ mvn compile -Pdefault,scala-2.13
...
[ERROR] Failed to execute goal on project xgboost4j-example_2.13: Could not resolve dependencies for project ml.dmlc:xgboost4j-example_2.13:jar:2.0.0-SNAPSHOT: Could not find artifact ml.dmlc:xgboost4j_2.12:jar:2.0.0-SNAPSHOT in central_maven (https://repo1.maven.org/maven2) -> [Help 1]

it fails looking for some 2.12 version of the xgboost4j jar. Not sure if this just for me or if there is something more that was missed.

</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.12</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<artifactId>xgboost4j-spark_2.12</artifactId>
<artifactId>xgboost4j-spark_2.13</artifactId>

?

Copy link
Contributor Author

@dotbg dotbg May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eejbyfeldt unfortunately, you have to do mvn install -Pdefault,scala-2.13 instead of simply mvn compile :(

Thanks a lot for your comments. I totally missed few important things :(

@@ -47,7 +60,7 @@ libraryDependencies ++= Seq(

For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases).

To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12`(`xgboost4j-gpu_2.13`) and `xgboost4j-spark-gpu_2.12`(`xgboost4j-spark-gpu_2.13`) instead.
Copy link
Contributor

@eejbyfeldt eejbyfeldt Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since xgboost4j-spark-gpu_2.13 depends on rapids-4-spark which only has a 2.12 version https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark I do not think it possible to build the xgboost4j-spark-gpu package for 2.13 yet.

Copy link
Contributor Author

@dotbg dotbg May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. NVIDIA/spark-rapids#1525
I will fix the docs, thanks a lot!

jvm-packages/pom.xml Outdated Show resolved Hide resolved
jvm-packages/pom.xml Outdated Show resolved Hide resolved
jvm-packages/xgboost4j-gpu/pom.xml Outdated Show resolved Hide resolved
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j-spark_2.12</artifactId>
<name>xgboost4j-spark</name>
<artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a property here leads to maven producing this warning:

[WARNING] Some problems were encountered while building the effective model for ml.dmlc:xgboost4j-spark_2.13:jar:2.0.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ ml.dmlc:xgboost4j-spark_${scala.binary.version}:2.0.0-SNAPSHOT, /home/eejbyfeldt/dev/dmlc/xgboost/jvm-packages/xgboost4j-spark/pom.xml, line 12, column 17

is that really save to ignore?

Spark that also uses maven as build system has a dedicated script (https://github.com/apache/spark/blob/master/dev/change-scala-version.sh) for changing the scala version. I am not very familiar with maven but maybe there are some good reasons they chose that approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe for now. Sadly, making the script as spark adds more complexity than just using profiles.
I've reused the approach discussed in the davidB/scala-maven-plugin#177
Spark approach originates in 2015 and maven changed since. I can make a script in place of maven profiles, if necessary.

jvm-packages/xgboost4j-tester/generate_pom.py Outdated Show resolved Hide resolved
jvm-packages/xgboost4j-tester/generate_pom.py Outdated Show resolved Hide resolved
@trivialfis
Copy link
Member

But since I not a maintainer for this project I guess you have to take them for what they are.

You are more than welcome to contribute through any means. Also, we are looking for new maintainers for jvm packages and R.

@dotbg dotbg force-pushed the scala-2.13-support branch 4 times, most recently from 8004451 to 3cbd533 Compare May 2, 2023 14:13
@dotbg dotbg requested a review from eejbyfeldt May 2, 2023 14:13
@dotbg
Copy link
Contributor Author

dotbg commented May 2, 2023

Also, we are looking for new maintainers for jvm packages and R.

@trivialfis I have few things I'd like to do in the jvm-packages and have some spare time for that, so I can lend a hand for a while.

@trivialfis
Copy link
Member

Thank you for sharing your interest! Let's work through those items first.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions, still trying to make time to learn about the spark ecosystem. ;-)

Otherwise looks fine to me. cc @wbo4958 .

@@ -96,7 +121,9 @@ libraryDependencies ++= Seq(

For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html).

### GPU algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbo4958 Could you please help confirm this is still the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly, the NVIDIA/spark-rapids#1525 is a blocker for the GPU-enabled modules. I do not think this has to be a showstopper for 2.13.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems yes.

@@ -5,7 +5,7 @@
<modelVersion>4.0.0</modelVersion>

<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the newbie question. Is this changing the jar name? If so, is it considered a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not changing the jar name, as the relevant project is not published as jar.

    <packaging>pom</packaging>

Pom-packaging is a way to specify the container of submodules. It does not lead to a jar.
jvm-packages only produce jars for xgboost4j-spark, xgboost4j, xgboost4j-flink, xgboost4j-example and the gpu-enabled modules

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information!

@dotbg dotbg requested a review from trivialfis May 8, 2023 08:01
@wbo4958
Copy link
Contributor

wbo4958 commented May 8, 2023

great job. I will review it. Sorry for that

@wbo4958
Copy link
Contributor

wbo4958 commented May 8, 2023

Really great job. LGTM assuming the CI can pass

@dotbg
Copy link
Contributor Author

dotbg commented May 8, 2023

Thanks a lot @wbo4958 I've updated the xgboost4j-tester so that it checks both 2.12 and 2.13.

@dotbg dotbg requested a review from wbo4958 May 8, 2023 13:55
@dotbg
Copy link
Contributor Author

dotbg commented May 9, 2023

I am not sure whether I can do anything for docs and python tests.
@trivialfis , @wbo4958 are these known issues?

@dotbg
Copy link
Contributor Author

dotbg commented May 10, 2023

Resolved conflicts.

@trivialfis
Copy link
Member

Will merge once all tests are green. Thank you for the work on supporting the newer Scala!

@dotbg dotbg force-pushed the scala-2.13-support branch 2 times, most recently from 1fd2b3e to 4fa4417 Compare May 10, 2023 21:18
@dotbg
Copy link
Contributor Author

dotbg commented May 10, 2023

@trivialfis had to modify the docker image used for the integration tests. Should be green now.

@dotbg dotbg force-pushed the scala-2.13-support branch 3 times, most recently from a595a85 to 29f2380 Compare May 12, 2023 10:26
@dotbg dotbg force-pushed the scala-2.13-support branch 3 times, most recently from d8e1088 to 6b3c971 Compare May 15, 2023 14:34
@dotbg
Copy link
Contributor Author

dotbg commented May 15, 2023

updated the logic, used to build images for the integration tests + rebased

@trivialfis
Copy link
Member

Thank you for the update! Are you able to reproduce this error: https://buildkite.com/xgboost/xgboost-ci-multi-gpu/builds/2184#01881fd5-fa29-4540-92f4-eb8861dab13b ?

Feel free to let me know if there's anything I can help.

@wbo4958
Copy link
Contributor

wbo4958 commented May 16, 2023

Thank you for the update! Are you able to reproduce this error: https://buildkite.com/xgboost/xgboost-ci-multi-gpu/builds/2184#01881fd5-fa29-4540-92f4-eb8861dab13b ?

Feel free to let me know if there's anything I can help.

Oops, spark-rapids 23.04.0 doesn't support spark3.4. I will fix it.

@dotbg
Copy link
Contributor Author

dotbg commented May 16, 2023

@wbo4958 thanks a lot! Reproducing some things locally requires patching, so it takes more time than just writing code.
I am not sure whether some of the changes in CI scripts are worth making a PR with these.

@dotbg
Copy link
Contributor Author

dotbg commented May 17, 2023

@trivialfis, @wbo4958 I have rebased the branch again, so all changes should be included. Do think, I should do anything else before this MR can be merged?

@trivialfis
Copy link
Member

Restarted the CI due to a network error. Maven seems quite unstable on the CI recently.

@dotbg dotbg requested a review from trivialfis May 22, 2023 10:05
@dotbg dotbg force-pushed the scala-2.13-support branch 2 times, most recently from 47d809a to 821b39d Compare May 25, 2023 11:40
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, one minor issue in the comments.

Thank you for the nice work on not only updating the Scala version but also handling the tests!

dotbg added 2 commits May 26, 2023 22:55
1. Updated the test logic
2. Added smoke tests for Spark examples.
3. Added integration tests for Spark with Scala 2.13
@trivialfis trivialfis merged commit a01df10 into dmlc:master May 27, 2023
@dotbg
Copy link
Contributor Author

dotbg commented May 31, 2023

@trivialfis Thanks a lot !

Are there any release plans?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants