Skip to content

Commit

Permalink
Scala 2.13 support.
Browse files Browse the repository at this point in the history
  • Loading branch information
dotbg committed May 2, 2023
1 parent 08ce495 commit 3cbd533
Show file tree
Hide file tree
Showing 15 changed files with 113 additions and 42 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/jvm_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,13 @@ jobs:
if: matrix.os == 'ubuntu-latest' # Distributed training doesn't work on Windows
env:
RABIT_MOCK: ON


- name: Build and Test XGBoost4J with scala 2.13
run: |
rm -rfv build/
cd jvm-packages
mvn -B clean install test -Pdefault,scala-2.13
if: matrix.os == 'ubuntu-latest' # Distributed training doesn't work on Windows
env:
RABIT_MOCK: ON
29 changes: 28 additions & 1 deletion jvm-packages/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,19 @@ XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5.
<version>latest_version_num</version>
</dependency>
```
or
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
```

<b>sbt</b>
```sbt
Expand All @@ -47,7 +60,6 @@ libraryDependencies ++= Seq(

For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases).

To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.

### Access SNAPSHOT version

Expand Down Expand Up @@ -85,6 +97,19 @@ Then add XGBoost4J as a dependency:
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```
or with scala 2.13
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```

<b>sbt</b>
```sbt
Expand All @@ -96,7 +121,9 @@ libraryDependencies ++= Seq(

For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html).

### GPU algorithm
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.
Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12.

## Examples

Expand Down
25 changes: 21 additions & 4 deletions jvm-packages/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<modelVersion>4.0.0</modelVersion>

<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>XGBoost JVM Package</name>
Expand Down Expand Up @@ -34,6 +34,7 @@
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<flink.version>1.17.0</flink.version>
<junit.version>4.13.2</junit.version>
<spark.version>3.4.0</spark.version>
<scala.version>2.12.17</scala.version>
<scala.binary.version>2.12</scala.binary.version>
Expand All @@ -44,7 +45,9 @@
<cudf.version>23.04.0</cudf.version>
<spark.rapids.version>23.04.0</spark.rapids.version>
<cudf.classifier>cuda11</cudf.classifier>
</properties>
<scalatest.version>3.2.15</scalatest.version>
<scala-collection-compat.version>2.9.0</scala-collection-compat.version>
</properties>
<repositories>
<repository>
<id>central_maven</id>
Expand All @@ -70,6 +73,14 @@
</modules>
</profile>

<profile>
<id>scala-2.13</id>
<properties>
<scala.binary.version>2.13</scala.binary.version>
<scala.version>2.13.10</scala.version>
</properties>
</profile>

<!-- gpu profile with both cpu and gpu test suites -->
<profile>
<id>gpu</id>
Expand Down Expand Up @@ -466,6 +477,7 @@
</plugins>
</reporting>
<dependencies>

<dependency>
<groupId>com.esotericsoftware</groupId>
<artifactId>kryo</artifactId>
Expand All @@ -482,6 +494,11 @@
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-collection-compat_${scala.binary.version}</artifactId>
<version>${scala-collection-compat.version}</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
Expand All @@ -490,13 +507,13 @@
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.2.15</version>
<version>${scalatest.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalactic</groupId>
<artifactId>scalactic_${scala.binary.version}</artifactId>
<version>3.2.15</version>
<version>${scalatest.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
Expand Down
5 changes: 3 additions & 2 deletions jvm-packages/xgboost4j-example/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j-example_2.12</artifactId>
<name>xgboost4j-example</name>
<artifactId>xgboost4j-example_${scala.binary.version}</artifactId>
<version>2.0.0-SNAPSHOT</version>
<packaging>jar</packaging>
<build>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,13 @@ object DistTrainWithFlink {
.map(_.f1.f0)
.returns(testDataTypeHint)

val paramMap = mapAsJavaMap(Map(
("eta", "0.1".asInstanceOf[AnyRef]),
("max_depth", "2"),
("objective", "binary:logistic"),
("verbosity", "1")
))
val paramMap = Map(
("eta", "0.1".asInstanceOf[AnyRef]),
("max_depth", "2"),
("objective", "binary:logistic"),
("verbosity", "1")
)
.asJava

// number of iterations
val round = 2
Expand Down
4 changes: 3 additions & 1 deletion jvm-packages/xgboost4j-flink/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>

<name>xgboost4j-flink</name>
<artifactId>xgboost4j-flink_${scala.binary.version}</artifactId>
<version>2.0.0-SNAPSHOT</version>
<properties>
Expand Down
9 changes: 5 additions & 4 deletions jvm-packages/xgboost4j-gpu/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j-gpu_2.12</artifactId>
<artifactId>xgboost4j-gpu_${scala.binary.version}</artifactId>
<name>xgboost4j-gpu</name>
<version>2.0.0-SNAPSHOT</version>
<packaging>jar</packaging>

Expand All @@ -35,13 +36,13 @@
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.2.15</version>
<version>${scalatest.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
Expand Down
7 changes: 4 additions & 3 deletions jvm-packages/xgboost4j-spark-gpu/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j-spark-gpu_2.12</artifactId>
<name>xgboost4j-spark-gpu</name>
<artifactId>xgboost4j-spark-gpu_${scala.binary.version}</artifactId>
<build>
<plugins>
<plugin>
Expand All @@ -24,7 +25,7 @@
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-gpu_${scala.binary.version}</artifactId>
<version>2.0.0-SNAPSHOT</version>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
7 changes: 4 additions & 3 deletions jvm-packages/xgboost4j-spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j-spark_2.12</artifactId>
<name>xgboost4j-spark</name>
<artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
<build>
<plugins>
<plugin>
Expand All @@ -24,7 +25,7 @@
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_${scala.binary.version}</artifactId>
<version>2.0.0-SNAPSHOT</version>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
4 changes: 2 additions & 2 deletions jvm-packages/xgboost4j-tester/generate_pom.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
<modelVersion>4.0.0</modelVersion>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-tester_2.12</artifactId>
<artifactId>xgboost4j-tester_${scala.binary.version}</artifactId>
<version>1.0-SNAPSHOT</version>
<name>xgboost4j-tester_2.12</name>
<name>xgboost4j-tester</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Expand Down
9 changes: 5 additions & 4 deletions jvm-packages/xgboost4j/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm_2.12</artifactId>
<artifactId>xgboost-jvm</artifactId>
<version>2.0.0-SNAPSHOT</version>
</parent>
<artifactId>xgboost4j_2.12</artifactId>
<name>xgboost4j</name>
<artifactId>xgboost4j_${scala.binary.version}</artifactId>
<version>2.0.0-SNAPSHOT</version>
<packaging>jar</packaging>

Expand All @@ -28,13 +29,13 @@
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.2.15</version>
<version>${scalatest.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ trait EvalTrait extends IEvaluation {
*/
def eval(predicts: Array[Array[Float]], dmat: DMatrix): Float

private[scala] def eval(predicts: Array[Array[Float]], jdmat: java.DMatrix): Float = {
def eval(predicts: Array[Array[Float]], jdmat: java.DMatrix): Float = {
require(predicts.length == jdmat.getLabel.length, "predicts size and label size must match " +
s" predicts size: ${predicts.length}, label size: ${jdmat.getLabel.length}")
eval(predicts, new DMatrix(jdmat))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ trait ObjectiveTrait extends IObjective {
*/
def getGradient(predicts: Array[Array[Float]], dtrain: DMatrix): List[Array[Float]]

private[scala] def getGradient(predicts: Array[Array[Float]], dtrain: JDMatrix):
def getGradient(predicts: Array[Array[Float]], dtrain: JDMatrix):
java.util.List[Array[Float]] = {
getGradient(predicts, new DMatrix(dtrain)).asJava
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,11 @@
package ml.dmlc.xgboost4j.scala

import java.io.InputStream
import ml.dmlc.xgboost4j.java.{XGBoostError, XGBoost => JXGBoost}

import ml.dmlc.xgboost4j.java.{XGBoostError, Booster => JBooster, XGBoost => JXGBoost}
import scala.collection.JavaConverters._

import scala.jdk.CollectionConverters._
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.fs.Path

/**
* XGBoost Scala Training function.
Expand All @@ -40,7 +39,12 @@ object XGBoost {
earlyStoppingRound: Int = 0,
prevBooster: Booster,
checkpointParams: Option[ExternalCheckpointParams]): Booster = {
val jWatches = watches.mapValues(_.jDMatrix).asJava

// we have to filter null value for customized obj and eval
val jParams: java.util.Map[String, AnyRef] =
params.filter(_._2 != null).mapValues(_.toString.asInstanceOf[AnyRef]).toMap.asJava

val jWatches = watches.mapValues(_.jDMatrix).toMap.asJava
val jBooster = if (prevBooster == null) {
null
} else {
Expand All @@ -51,8 +55,7 @@ object XGBoost {
map(cp => {
JXGBoost.trainAndSaveCheckpoint(
dtrain.jDMatrix,
// we have to filter null value for customized obj and eval
params.filter(_._2 != null).mapValues(_.toString.asInstanceOf[AnyRef]).asJava,
jParams,
numRounds, jWatches, metrics, obj, eval, earlyStoppingRound, jBooster,
cp.checkpointInterval,
cp.checkpointPath,
Expand All @@ -61,8 +64,7 @@ object XGBoost {
getOrElse(
JXGBoost.train(
dtrain.jDMatrix,
// we have to filter null value for customized obj and eval
params.filter(_._2 != null).mapValues(_.toString.asInstanceOf[AnyRef]).asJava,
jParams,
numRounds, jWatches, metrics, obj, eval, earlyStoppingRound, jBooster)
)
if (prevBooster == null) {
Expand Down
Loading

0 comments on commit 3cbd533

Please sign in to comment.