Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Initial Java Support for GDS to KvikIO #396

Open
wants to merge 32 commits into
base: branch-24.12
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
83044ff
Initial commit
aslobodaNV Jun 25, 2024
62649e6
Update documentation to better flash out how to compile and run the e…
aslobodaNV Jun 25, 2024
161b260
code touchups and Readme update
aslobodaNV Jun 25, 2024
30aa3dc
Update README with markdown formatting. Improve instructions and linkage
aslobodaNV Jun 26, 2024
299bdb6
Add initial maven setup, needs to be debugged
aslobodaNV Jul 12, 2024
1162efc
Update maven build to properly generate shared library
aslobodaNV Jul 12, 2024
3f29546
Merge branch 'branch-24.10' into add_initial_java_support
jakirkham Jul 24, 2024
b3dd38f
Merge branch 'rapidsai:branch-24.10' into add_initial_java_support
aslobodaNV Sep 4, 2024
fe91427
Fix pre-commit issues
aslobodaNV Sep 4, 2024
4ad08c6
move example to be a test, update pom and dependencies to support CI …
aslobodaNV Sep 4, 2024
83875b6
pre-commit fixes
aslobodaNV Sep 4, 2024
4120e4c
add github workflow items
aslobodaNV Sep 4, 2024
888d6a7
Updating workflows
aslobodaNV Sep 18, 2024
d337a16
Merge branch 'branch-24.10' into add_initial_java_support
aslobodaNV Sep 18, 2024
833ad32
Formatting inconsistencies
aslobodaNV Sep 18, 2024
93598a8
Merge remote-tracking branch 'upstream/branch-24.12' into add_initial…
aslobodaNV Oct 1, 2024
661fcce
Update yaml files based on CR feedback, update versions to 24.12 from…
aslobodaNV Oct 1, 2024
7610edb
Fix needs and permissions.
bdice Oct 1, 2024
3eb3daa
Remove test_python_legate.
bdice Oct 1, 2024
13b9a05
Merge branch 'branch-24.12' into add_initial_java_support
bdice Oct 1, 2024
e3f8448
Merge branch 'add_initial_java_support' of github.com:aslobodaNV/kvik…
bdice Oct 1, 2024
fa749a3
Update java/pom.xml
aslobodaNV Oct 2, 2024
222426c
Update package for java bindings.
aslobodaNV Oct 11, 2024
c704108
Use cmake instead of explicit nvcc command
aslobodaNV Oct 11, 2024
69b6d39
Merge remote-tracking branch 'upstream/branch-24.12' into add_initial…
aslobodaNV Oct 11, 2024
40a588f
Merge branch 'branch-24.12' into add_initial_java_support
aslobodaNV Oct 18, 2024
fabd3c1
Fix style issues and missing license
aslobodaNV Oct 18, 2024
c29ac7b
Fix CMakeLists style issues
aslobodaNV Oct 18, 2024
157867f
Update dependencies to try and fix container build
aslobodaNV Oct 18, 2024
acc6777
Just add make not ninja
aslobodaNV Oct 18, 2024
cfa183a
Fix some of the build and container issues, linking issues remain
aslobodaNV Oct 22, 2024
2d41f10
Merge remote-tracking branch 'upstream/branch-24.12' into add_initial…
aslobodaNV Oct 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
upload-conda:
needs: [cpp-build, python-build]
needs: [cpp-build, python-build, java-build]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert, since no java-build job exists anymore.

Suggested change
needs: [cpp-build, python-build, java-build]
needs: [cpp-build, python-build]

secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ jobs:
- checks
- conda-cpp-build
- conda-cpp-tests
- conda-java-build
- conda-java-tests
- conda-python-build
- conda-python-tests
- docs-build
Expand All @@ -39,6 +41,18 @@ jobs:
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
build_type: pull-request
conda-java-build:
needs: checks
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't building a conda package of this, we do not need a build job. The conda-java-build can be deleted, and leave only conda-java-tests below.

The test job will:

  • Generate a conda environment with all build dependencies
  • Install libkvikio
  • Run maven to build the Java bindings and tests
  • Run the tests

with:
build_type: pull-request
conda-java-tests:
needs: conda-java-build
secrets: inherit
bdice marked this conversation as resolved.
Show resolved Hide resolved
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other place we need to use a custom job. There is no shared-workflows workflow named conda-java-tests. Make this look like the code in test.yaml except with different parameters (different build_type:, no branch:, etc.). Look at cuDF for inspiration here.

Also, let's retarget this work to branch-24.12. You'll need to:

  • Merge the latest branch-24.12 into your PR
  • Update any lingering references to branch-24.10 (this line will be one)
  • Change the target branch on your PR to branch-24.12

with:
build_type: pull-request
bdice marked this conversation as resolved.
Show resolved Hide resolved
conda-python-build:
needs: conda-cpp-build
secrets: inherit
Expand Down
12 changes: 12 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,15 @@ jobs:
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
conda-java-tests:
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use branch-24.12 after you merge in the upstream and retarget this PR:

Suggested change
uses: rapidsai/shared-workflows/.github/workflows/custom-job.yaml@python-3.12
uses: rapidsai/shared-workflows/.github/workflows/custom-job.yaml@branch-24.12

with:
build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
node_type: "gpu-v100-latest-1"
arch: "amd64"
container_image: "rapidsai/ci-conda:latest"
run_script: "ci/test_java.sh"
43 changes: 43 additions & 0 deletions ci/test_java.sh
bdice marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/bin/bash
# Copyright (c) 2024, NVIDIA CORPORATION.

set -euo pipefail

. /opt/conda/etc/profile.d/conda.sh

rapids-logger "Generate java testing dependencies"
rapids-dependency-file-generator \
--output conda \
--file-key test_java \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch)" | tee env.yaml

rapids-mamba-retry env create --yes -f env.yaml -n test

# Temporarily allow unbound variables for conda activation.
set +u
conda activate test
set -u

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

rapids-print-env

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
libkvikio libkvikio-tests

rapids-logger "Check GPU usage"
nvidia-smi

EXITCODE=0
trap "EXITCODE=1" ERR
set +e

rapids-logger "Run Java tests"
pushd java
mvn test -B
popd

rapids-logger "Test script exiting with value: $EXITCODE"
exit ${EXITCODE}
11 changes: 11 additions & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ files:
key: test
includes:
- test_python
test_java:
output: none
includes:
- cuda
- test_java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you'll need CMake and C++ compilers? You'll also need cuda_version to constrain which nvcc and nvcomp versions you get.

Suggested change
includes:
- cuda
- test_java
includes:
- build-universal
- build-cpp
- cuda_version
- cuda
- test_java

channels:
- rapidsai
- rapidsai-nightly
Expand Down Expand Up @@ -335,3 +340,9 @@ dependencies:
packages:
- *dask
- distributed>=2022.05.2
test_java:
common:
- output_types: conda
packages:
- maven
- openjdk=8.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use openjdk=8.*? It appears that the latest is openjdk=22.*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, I just happened to have 8 installed and used that. I will update to 22

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JDK8 is a very much a least common denominator. It is no longer supported without paying a lot of money to oracle though.

https://www.oracle.com/java/technologies/java-se-support-roadmap.html

and oracle is trying very hard to push people away from older versions of java.

java 11, 17, and 21 are long term support versions that are currently GA.

Java generally has really good backwards compatibility so almost anything compiled with JDK8 can run on anything newer. But newer code is not backwards compatible with older runtimes. But you can ask newer compilers to output binary code that is compatible with older JDK versions. That is what we do with the Spark plugin.

We also support whatever Spark supports for their default binary release. So for older versions of Spark it is JDK8. For some of the latest versions that is JDK17, but because of how we ask the compiler to output older binary versions it really becomes a requirement that they have at least that version of java installed.

I wouldn't update to jdk22 unless you also decide which version of java you want to be minimally compatible with and ask the compiler to output compatible .class files.

Also just another point to think about jcuda still uses openjdk8. This is probably to have as much compatibility as possible. https://github.com/jcuda/jcuda-main/blob/master/BUILDING.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that context, that makes a lot of sense. I have not been following the broader Java versioning ecosystem so was unaware of much of this. I think it is likely we will want to pick one of the versions with long term support, I would guess 17 or 21, but will think on it some more before making an update.

74 changes: 74 additions & 0 deletions java/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Java KvikIO Bindings

## Summary
These Java KvikIO bindings for GDS currently support only synchronous read and write IO operations using the underlying CuFile API. Support for batch IO and asynchronous operations are not yet supported.

## Dependencies
The Java KvikIO bindings have been developed to work on Linux based systems and require [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html) to be installed and for [GDS](https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html) to be properly enabled. To compile the shared library it is also necessary to have a JDK installed. To run the included example, it is also necessary to install JCuda as it is used to handle memory allocations and the transfer of data between host and GPU memory. JCuda jar files supporting CUDA 12.x can be found here:
[jcuda-12.0.0.jar](https://repo1.maven.org/maven2/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar),
[jcuda-natives-12.0.0.jar](https://repo1.maven.org/maven2/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar)

For more information on JCuda and potentially more up to date installation instructions or jar files, see here:
[JCuda](http://javagl.de/jcuda.org/), [JCuda Usage](https://github.com/jcuda/jcuda-main/blob/master/USAGE.md), [JCuda Maven Repo](https://mvnrepository.com/artifact/org.jcuda)

## Compilation
To recompile the .so file for your local system run the following command. Note: Update the command to reflect the directory where you have installed CUDA and your JDK.

/usr/local/cuda/bin/nvcc -shared -o libCuFileJNI.so -I/usr/local/cuda/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/linux src/main/native/src/CuFileJni.cpp --compiler-options "-fPIC" -lcufile

The resulting .so file must be in your JVM library path when running upstream Java programs. If it is not already placed on your path in can be included by including an argument like the following:

-Djava.library.path={path/to/your/so/file/}

## Examples
An example for how to use the Java KvikIO bindings can be found in src/main/java/bindings/kvikio/example . Note: This example has a dependency on JCuda so ensure that when running the example the JCuda shared library files are on the JVM library path along with the libCuFileJNI.so file.

### Specific instructions to run the example using Maven

#### Compile the shared library and Java files with Maven

cd kvikio/java/
mvn clean install

#### Setup a test file target NOTE: your mount directory may differ from /mnt/nvme, so update this command appropriately as well as example/Main.java to point to the correct file path.

touch /mnt/nvme/java_test

#### Run example

cd kvikio/java/
java -cp target/cufile-24.10.0-SNAPSHOT.jar:$HOME/.m2/repository/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar:$HOME/.m2/repository/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar -Djava.library.path=./target bindings.kvikio.example.Main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cufile-24.10.0-SNAPSHOT.jar

cuFile is not versioned like RAPIDS. Is this correct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the version, the jar that is generated is not aware of what version of libcufile was used to generate it... that just depends on what version of the cuda toolkit the host machine has installed. It appears I can't make the jar not have a version number at all, so I'm inclined to have the version represent the version of kvikio. If you think there's a way I can inject the version of libcufile into maven at runtime I am open to that, but I have not found any information on how that would be done so far.


### Specific instructions to run the example from a terminal

#### Compile class files

cd kvikio/java/src/main/java/bindings/kvikio/cufile
javac *.java

#### Retrieve Jcuda jar files

cd kvikio/java/
mkdir lib
cd lib
wget https://repo1.maven.org/maven2/org/jcuda/jcuda/12.0.0/jcuda-12.0.0.jar
wget https://repo1.maven.org/maven2/org/jcuda/jcuda-natives/12.0.0/jcuda-natives-12.0.0.jar

#### Compile shared library

cd kvikio/java/lib
/usr/local/cuda/bin/nvcc -shared -o libCuFileJNI.so -I/usr/local/cuda/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/ -I/usr/lib/jvm/java-21-openjdk-amd64/include/linux ../src/main/native/src/CuFileJni.cpp --compiler-options "-fPIC" -lcufile

#### Setup a test file target NOTE: your mount directory may differ from /mnt/nvme, so update this command appropriately as well as example/Main.java to point to the correct file path.

touch /mnt/nvme/java_test

#### Compile example file

cd kvikio/java/src/main/java
javac -cp .:../../../lib/jcuda-12.0.0.jar:../../../lib/jcuda-natives-12.0.0.jar bindings/kvikio/example/Main.java

#### Run example

cd kvikio/java/src/main/java
java -cp .:../../../lib/jcuda-12.0.0.jar:../../../lib/jcuda-natives-12.0.0.jar -Djava.library.path=../../../lib/ bindings.kvikio.example.main
147 changes: 147 additions & 0 deletions java/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>bindings.kvikio</groupId>
<artifactId>cufile</artifactId>
<version>24.10.0-SNAPSHOT</version>

<name>cufile</name>
<description>
This project provides java bindings for the GPUDirect Storage cufile library, enabling the GPU to load and
save large amounts of data to and from persistent storage. This is still a work in progress so some APIs may change.
</description>
<url>http://ai.rapids</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
bdice marked this conversation as resolved.
Show resolved Hide resolved
<junit.version>5.4.2</junit.version>
</properties>

<dependencies>
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcuda</artifactId>
<version>12.0.0</version>
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved
</dependency>
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcuda-natives</artifactId>
<version>12.0.0</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-exec-plugin</artifactId>
<version>1.6.0</version>
</plugin>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<createDirs>true</createDirs>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>21</source>
<target>21</target>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where you can set what the output target is that you want.

</configuration>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
<configuration>
<argLine>-Djava.library.path=${project.build.directory}:${java.library.path}</argLine>
</configuration>
<dependencies>
<dependency>
<groupId>org.junit.platform</groupId>
<artifactId>junit-platform-surefire-provider</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>5.4.2</version>
</dependency>
</dependencies>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<id>compile-native-code</id>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<!-- Compile native code using nvcc -->
<exec executable="/usr/local/cuda/bin/nvcc">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aslobodaNV The first real failure: this path is not the correct path to nvcc in the CI environment, which installed cuda-nvcc with conda. Can you just call nvcc?

Suggested change
<exec executable="/usr/local/cuda/bin/nvcc">
<exec executable="nvcc">

Error log here:

Error:  Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.0.0:run (compile-native-code) on project cufile: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "/usr/local/cuda/bin/nvcc" (in directory "/__w/kvikio/kvikio/java"): error=2, No such file or directory
Error:  around Ant part ...<exec executable="/usr/local/cuda/bin/nvcc">... @ 4:49 in /__w/kvikio/kvikio/java/target/antrun/build-main.xml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the include paths should be located in the conda environment. The CUDA packages use a special "targets" layout. For example, libcufile-dev will install cufile.h into ${CONDA_PREFIX}/targets/x86_64-linux/include/cufile.h.

The openjdk package installs some headers into ${CONDA_PREFIX}/include and some into ${CONDA_PREFIX}/include/linux.

CMake knows how to detect and parse this layout of the CUDA Toolkit but I do not know how Maven or other build systems are supposed to consume it. I would ask the Spark team, perhaps they know: @revans2, do you have insight here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine with committing the update to just call nvcc instead of the path, I just didn't happen to have nvcc on my path when I wrote that. Let me look into the include paths and reach out to Bobby if I get stuck.

<arg value="-shared"/>
<arg value="-o"/>
<arg value="${project.build.directory}/libCuFileJNI.so"/>
<arg value="-I/usr/local/cuda/include/"/>
<arg value="-I/usr/lib/jvm/java-21-openjdk-amd64/include/"/>
<arg value="-I/usr/lib/jvm/java-21-openjdk-amd64/include/linux"/>
<arg value="${project.basedir}/src/main/native/src/CuFileJni.cpp"/>
<arg value="--compiler-options"/>
<arg value="-fPIC"/>
<arg value="-lcufile"/>
</exec>
</target>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
45 changes: 45 additions & 0 deletions java/src/main/java/bindings/kvikio/cufile/CuFile.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package bindings.kvikio.cufile;
aslobodaNV marked this conversation as resolved.
Show resolved Hide resolved

public class CuFile {
private static boolean initialized = false;
private static CuFileDriver driver;

static {
initialize();
}

static synchronized void initialize() {
if (!initialized) {
try {
System.loadLibrary("CuFileJNI");
driver = new CuFileDriver();
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
driver.close();
}));
initialized = true;
} catch (Throwable t) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: catching Throwable is not ideal. It can include all kinds of unrecoverable errors. At a minimum you probably want to print out why this could not be initialized in the error message. That way you can debug that a dependency was missing/etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to point me to an example of best practices in this area? Happy to change it further from my latest update.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked it as a nit because it is probably fine in this case. In fact when I went back to look at the CUDF code, I did the exact same thing 5 years ago.

https://github.com/rapidsai/cudf/blob/4dbb8a354a9d4f0b4d82a5bf9747409c6304358f/java/src/main/java/ai/rapids/cudf/NativeDepsLoader.java#L74-L83

It is just generally not good practice to catch a Throwable, because Throwable includes all Errors and Errors are generally considered to not be recoverable. But really the main thing here is giving the user enough information that they can debug why it is failing. You are printing the error message, but the full stack trace would be better. That is the main thing that I would suggest you do.

System.out.println("could not load cufile jni library");
}
}
}

public static boolean libraryLoaded() {
return initialized;
}
}
Loading