Skip to content

Commit

Permalink
GH-39001: [Java] Modularize remaining modules (#39221)
Browse files Browse the repository at this point in the history
### Rationale for this change
Modularize remaining modules outside of memory modules, vector, and format.

### What changes are included in this PR?

### Are these changes tested?
Yes, existing unit tests now run with modules when using JDK9+.

### Are there any user-facing changes?
Yes. There are new command-line options that may be necessary. The way of specifying the output directory for
JNI native library builds differs. The flight-grpc module has been eliminated since it is now built into flight-core.
Documentation has been updated for these changes.

**This PR includes breaking changes to public APIs.**
There are a number of package structure changes and some modules now need additional command-line arguments.

* Closes: #39001

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
  • Loading branch information
jduo authored Jan 19, 2024
1 parent 143a7da commit 92682f0
Show file tree
Hide file tree
Showing 98 changed files with 763 additions and 355 deletions.
2 changes: 1 addition & 1 deletion ci/scripts/integration_arrow_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ if [ "${ARROW_INTEGRATION_JAVA}" == "ON" ]; then
export ARROW_JAVA_CDATA="ON"
export JAVA_JNI_CMAKE_ARGS="-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF -DARROW_JAVA_JNI_ENABLE_C=ON"

${arrow_dir}/ci/scripts/java_jni_build.sh ${arrow_dir} ${ARROW_HOME} ${build_dir} /tmp/dist/java/$(arch)
${arrow_dir}/ci/scripts/java_jni_build.sh ${arrow_dir} ${ARROW_HOME} ${build_dir} /tmp/dist/java
${arrow_dir}/ci/scripts/java_build.sh ${arrow_dir} ${build_dir} /tmp/dist/java
fi

Expand Down
2 changes: 0 additions & 2 deletions ci/scripts/java_jni_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ arrow_install_dir=${2}
build_dir=${3}/java_jni
# The directory where the final binaries will be stored when scripts finish
dist_dir=${4}

prefix_dir="${build_dir}/java-jni"

echo "=== Clear output directories and leftovers ==="
Expand Down Expand Up @@ -56,7 +55,6 @@ cmake \
-DBUILD_TESTING=${ARROW_JAVA_BUILD_TESTS} \
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
-DCMAKE_PREFIX_PATH=${arrow_install_dir} \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=${prefix_dir} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD:-OFF} \
-DProtobuf_USE_STATIC_LIBS=ON \
Expand Down
11 changes: 5 additions & 6 deletions ci/scripts/java_jni_macos_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ case ${normalized_arch} in
;;
esac
# The directory where the final binaries will be stored when scripts finish
dist_dir=${3}/${normalized_arch}
dist_dir=${3}

echo "=== Clear output directories and leftovers ==="
# Clear output directories and leftovers
Expand Down Expand Up @@ -82,7 +82,6 @@ cmake \
-DARROW_S3=${ARROW_S3} \
-DARROW_USE_CCACHE=${ARROW_USE_CCACHE} \
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=${install_dir} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \
-DGTest_SOURCE=BUNDLED \
Expand Down Expand Up @@ -138,8 +137,8 @@ archery linking check-dependencies \
--allow libncurses \
--allow libobjc \
--allow libz \
libarrow_cdata_jni.dylib \
libarrow_dataset_jni.dylib \
libarrow_orc_jni.dylib \
libgandiva_jni.dylib
arrow_cdata_jni/${normalized_arch}/libarrow_cdata_jni.dylib \
arrow_dataset_jni/${normalized_arch}/libarrow_dataset_jni.dylib \
arrow_orc_jni/${normalized_arch}/libarrow_orc_jni.dylib \
gandiva_jni/${normalized_arch}/libgandiva_jni.dylib
popd
11 changes: 5 additions & 6 deletions ci/scripts/java_jni_manylinux_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ case ${normalized_arch} in
;;
esac
# The directory where the final binaries will be stored when scripts finish
dist_dir=${3}/${normalized_arch}
dist_dir=${3}

echo "=== Clear output directories and leftovers ==="
# Clear output directories and leftovers
Expand Down Expand Up @@ -91,7 +91,6 @@ cmake \
-DARROW_S3=${ARROW_S3} \
-DARROW_USE_CCACHE=${ARROW_USE_CCACHE} \
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=${ARROW_HOME} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \
-DGTest_SOURCE=BUNDLED \
Expand Down Expand Up @@ -164,8 +163,8 @@ archery linking check-dependencies \
--allow libstdc++ \
--allow libz \
--allow linux-vdso \
libarrow_cdata_jni.so \
libarrow_dataset_jni.so \
libarrow_orc_jni.so \
libgandiva_jni.so
arrow_cdata_jni/${normalized_arch}/libarrow_cdata_jni.so \
arrow_dataset_jni/${normalized_arch}/libarrow_dataset_jni.so \
arrow_orc_jni/${normalized_arch}/libarrow_orc_jni.so \
gandiva_jni/${normalized_arch}/libgandiva_jni.so
popd
3 changes: 1 addition & 2 deletions ci/scripts/java_jni_windows_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ set -ex
arrow_dir=${1}
build_dir=${2}
# The directory where the final binaries will be stored when scripts finish
dist_dir=${3}/x86_64
dist_dir=${3}

echo "=== Clear output directories and leftovers ==="
# Clear output directories and leftovers
Expand Down Expand Up @@ -72,7 +72,6 @@ cmake \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_ZSTD=ON \
-DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=${install_dir} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \
-GNinja \
Expand Down
2 changes: 2 additions & 0 deletions dev/archery/archery/integration/tester_java.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,8 @@ def __init__(self, *args, **kwargs):
self._java_opts.append(
'--add-opens=java.base/java.nio='
'org.apache.arrow.memory.core,ALL-UNNAMED')
self._java_opts.append(
'--add-reads=org.apache.arrow.flight.core=ALL-UNNAMED')

def _run(self, arrow_path=None, json_path=None, command='VALIDATE'):
cmd = (
Expand Down
38 changes: 19 additions & 19 deletions dev/tasks/java-jars/github.yml
Original file line number Diff line number Diff line change
Expand Up @@ -208,29 +208,29 @@ jobs:
run: |
set -x
test -f arrow/java-dist/x86_64/libarrow_cdata_jni.so
test -f arrow/java-dist/x86_64/libarrow_dataset_jni.so
test -f arrow/java-dist/x86_64/libarrow_orc_jni.so
test -f arrow/java-dist/x86_64/libgandiva_jni.so
test -f arrow/java-dist/arrow_cdata_jni/x86_64/libarrow_cdata_jni.so
test -f arrow/java-dist/arrow_dataset_jni/x86_64/libarrow_dataset_jni.so
test -f arrow/java-dist/arrow_orc_jni/x86_64/libarrow_orc_jni.so
test -f arrow/java-dist/gandiva_jni/x86_64/libgandiva_jni.so
test -f arrow/java-dist/aarch_64/libarrow_cdata_jni.so
test -f arrow/java-dist/aarch_64/libarrow_dataset_jni.so
test -f arrow/java-dist/aarch_64/libarrow_orc_jni.so
test -f arrow/java-dist/aarch_64/libgandiva_jni.so
test -f arrow/java-dist/arrow_cdata_jni/aarch_64/libarrow_cdata_jni.so
test -f arrow/java-dist/arrow_dataset_jni/aarch_64/libarrow_dataset_jni.so
test -f arrow/java-dist/arrow_orc_jni/aarch_64/libarrow_orc_jni.so
test -f arrow/java-dist/gandiva_jni/aarch_64/libgandiva_jni.so
test -f arrow/java-dist/x86_64/libarrow_cdata_jni.dylib
test -f arrow/java-dist/x86_64/libarrow_dataset_jni.dylib
test -f arrow/java-dist/x86_64/libarrow_orc_jni.dylib
test -f arrow/java-dist/x86_64/libgandiva_jni.dylib
test -f arrow/java-dist/arrow_cdata_jni/x86_64/libarrow_cdata_jni.dylib
test -f arrow/java-dist/arrow_dataset_jni/x86_64/libarrow_dataset_jni.dylib
test -f arrow/java-dist/arrow_orc_jni/x86_64/libarrow_orc_jni.dylib
test -f arrow/java-dist/gandiva_jni/x86_64/libgandiva_jni.dylib
test -f arrow/java-dist/aarch_64/libarrow_cdata_jni.dylib
test -f arrow/java-dist/aarch_64/libarrow_dataset_jni.dylib
test -f arrow/java-dist/aarch_64/libarrow_orc_jni.dylib
test -f arrow/java-dist/aarch_64/libgandiva_jni.dylib
test -f arrow/java-dist/arrow_cdata_jni/aarch_64/libarrow_cdata_jni.dylib
test -f arrow/java-dist/arrow_dataset_jni/aarch_64/libarrow_dataset_jni.dylib
test -f arrow/java-dist/arrow_orc_jni/aarch_64/libarrow_orc_jni.dylib
test -f arrow/java-dist/gandiva_jni/aarch_64/libgandiva_jni.dylib
test -f arrow/java-dist/x86_64/arrow_cdata_jni.dll
test -f arrow/java-dist/x86_64/arrow_dataset_jni.dll
test -f arrow/java-dist/x86_64/arrow_orc_jni.dll
test -f arrow/java-dist/arrow_cdata_jni/x86_64/arrow_cdata_jni.dll
test -f arrow/java-dist/arrow_dataset_jni/x86_64/arrow_dataset_jni.dll
test -f arrow/java-dist/arrow_orc_jni/x86_64/arrow_orc_jni.dll
- name: Build bundled jar
run: |
set -e
Expand Down
7 changes: 0 additions & 7 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -810,13 +810,6 @@ tasks:
- flight-core-{no_rc_snapshot_version}-tests.jar
- flight-core-{no_rc_snapshot_version}.jar
- flight-core-{no_rc_snapshot_version}.pom
- flight-grpc-{no_rc_snapshot_version}-cyclonedx.json
- flight-grpc-{no_rc_snapshot_version}-cyclonedx.xml
- flight-grpc-{no_rc_snapshot_version}-javadoc.jar
- flight-grpc-{no_rc_snapshot_version}-sources.jar
- flight-grpc-{no_rc_snapshot_version}-tests.jar
- flight-grpc-{no_rc_snapshot_version}.jar
- flight-grpc-{no_rc_snapshot_version}.pom
- flight-integration-tests-{no_rc_snapshot_version}-cyclonedx.json
- flight-integration-tests-{no_rc_snapshot_version}-cyclonedx.xml
- flight-integration-tests-{no_rc_snapshot_version}-jar-with-dependencies.jar
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1343,7 +1343,7 @@ services:
command:
[ "/arrow/ci/scripts/cpp_build.sh /arrow /build &&
/arrow/ci/scripts/python_build.sh /arrow /build &&
/arrow/ci/scripts/java_jni_build.sh /arrow $${ARROW_HOME} /build /tmp/dist/java/$$(arch) &&
/arrow/ci/scripts/java_jni_build.sh /arrow $${ARROW_HOME} /build /tmp/dist/java/ &&
/arrow/ci/scripts/java_build.sh /arrow /build /tmp/dist/java &&
/arrow/ci/scripts/java_cdata_integration.sh /arrow /tmp/dist/java" ]

Expand Down
54 changes: 23 additions & 31 deletions docs/source/developers/java/building.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,18 +115,17 @@ Maven
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N
$ ls -latr ../java-dist/lib/<your system's architecture>
|__ libarrow_cdata_jni.dylib
|__ libarrow_cdata_jni.so
$ ls -latr ../java-dist/lib
|__ arrow_cdata_jni/
- To build only the JNI C Data Interface library (Windows):

.. code-block::
$ cd arrow/java
$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N
$ dir "../java-dist/bin/x86_64"
|__ arrow_cdata_jni.dll
$ dir "../java-dist/bin"
|__ arrow_cdata_jni/
- To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:

Expand All @@ -136,19 +135,19 @@ Maven
$ export JAVA_HOME=<absolute path to your java home>
$ java --version
$ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N
$ ls -latr java-dist/lib/<your system's architecture>/*_{jni,java}.*
|__ libarrow_dataset_jni.dylib
|__ libarrow_orc_jni.dylib
|__ libgandiva_jni.dylib
$ ls -latr java-dist/lib
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/
- To build all JNI libraries (Windows) except the JNI C Data Interface library:

.. code-block::
$ cd arrow/java
$ mvn generate-resources -Pgenerate-libs-jni-windows -N
$ dir "../java-dist/bin/x86_64"
|__ arrow_dataset_jni.dll
$ dir "../java-dist/bin"
|__ arrow_dataset_jni/
CMake
~~~~~
Expand All @@ -166,12 +165,10 @@ CMake
-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \
-DBUILD_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \
-DCMAKE_INSTALL_PREFIX=java-dist
$ cmake --build java-cdata --target install --config Release
$ ls -latr java-dist/lib
|__ libarrow_cdata_jni.dylib
|__ libarrow_cdata_jni.so
|__ arrow_cdata_jni/
- To build only the JNI C Data Interface library (Windows):

Expand All @@ -186,11 +183,10 @@ CMake
-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^
-DBUILD_TESTING=OFF ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^
-DCMAKE_INSTALL_PREFIX=java-dist
$ cmake --build java-cdata --target install --config Release
$ dir "java-dist/bin"
|__ arrow_cdata_jni.dll
|__ arrow_cdata_jni/
- To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:

Expand Down Expand Up @@ -222,7 +218,6 @@ CMake
-DARROW_SUBSTRAIT=ON \
-DARROW_USE_CCACHE=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_UNITY_BUILD=ON
$ cmake --build cpp-jni --target install --config Release
Expand All @@ -233,16 +228,15 @@ CMake
-DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \
-DBUILD_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib/<your system's architecture> \
-DCMAKE_INSTALL_PREFIX=java-dist \
-DCMAKE_PREFIX_PATH=$PWD/java-dist \
-DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \
-DProtobuf_USE_STATIC_LIBS=ON
$ cmake --build java-jni --target install --config Release
$ ls -latr java-dist/lib/<your system's architecture>/*_{jni,java}.*
|__ libarrow_dataset_jni.dylib
|__ libarrow_orc_jni.dylib
|__ libgandiva_jni.dylib
$ ls -latr java-dist/lib/
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/
- To build all JNI libraries (Windows) except the JNI C Data Interface library:

Expand Down Expand Up @@ -271,7 +265,6 @@ CMake
-DARROW_WITH_ZLIB=ON ^
-DARROW_WITH_ZSTD=ON ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^
-DCMAKE_INSTALL_PREFIX=java-dist ^
-DCMAKE_UNITY_BUILD=ON ^
-GNinja
Expand All @@ -288,13 +281,12 @@ CMake
-DARROW_JAVA_JNI_ENABLE_ORC=ON ^
-DBUILD_TESTING=OFF ^
-DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_LIBDIR=lib/x86_64 ^
-DCMAKE_INSTALL_PREFIX=java-dist ^
-DCMAKE_PREFIX_PATH=$PWD/java-dist
$ cmake --build java-jni --target install --config Release
$ dir "java-dist/bin"
|__ arrow_orc_jni.dll
|__ arrow_dataset_jni.dll
|__ arrow_orc_jni/
|__ arrow_dataset_jni/
Archery
~~~~~~~
Expand All @@ -303,11 +295,11 @@ Archery
$ cd arrow
$ archery docker run java-jni-manylinux-2014
$ ls -latr java-dist/<your system's architecture>/
|__ libarrow_cdata_jni.so
|__ libarrow_dataset_jni.so
|__ libarrow_orc_jni.so
|__ libgandiva_jni.so
$ ls -latr java-dist
|__ arrow_cdata_jni/
|__ arrow_dataset_jni/
|__ arrow_orc_jni/
|__ gandiva_jni/
Building Java JNI Modules
-------------------------
Expand Down
35 changes: 34 additions & 1 deletion docs/source/java/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,40 @@ adding ``--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ...
Otherwise, you may see errors like ``module java.base does not "opens
java.nio" to unnamed module``.
java.nio" to unnamed module`` or ``module java.base does not "opens
java.nio" to org.apache.arrow.memory.core``

Note that the command has changed from Arrow 15 and earlier. If you are still using the flags from that version
(``--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED``) you will see the
``module java.base does not "opens java.nio" to org.apache.arrow.memory.core`` error.

If you are using flight-core or dependent modules, you will need to mark that flight-core can read unnamed modules.
Modifying the command above for Flight:

.. code-block:: shell
# Directly on the command line
$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ...
# Indirectly via environment variables
$ env _JAVA_OPTIONS="--add-reads=org.apache.arrow.flight.core=ALL-UNNAMED --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ...
Otherwise, you may see errors like ``java.lang.IllegalAccessError: superclass access check failed: class
org.apache.arrow.flight.ArrowMessage$ArrowBufRetainingCompositeByteBuf (in module org.apache.arrow.flight.core)
cannot access class io.netty.buffer.CompositeByteBuf (in unnamed module ...) because module
org.apache.arrow.flight.core does not read unnamed module ...
Finally, if you are using arrow-dataset, you'll also need to report that JDK internals need to be exposed.
Modifying the command above for arrow-memory:
.. code-block:: shell
# Directly on the command line
$ java --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED -jar ...
# Indirectly via environment variables
$ env _JAVA_OPTIONS="--add-opens=java.base/java.nio=org.apache.arrow.dataset,org.apache.arrow.memory.core,ALL-UNNAMED" java -jar ...
Otherwise you may see errors such as ``java.lang.RuntimeException: java.lang.reflect.InaccessibleObjectException:
Unable to make static void java.nio.Bits.reserveMemory(long,long) accessible: module
java.base does not "opens java.nio" to module org.apache.arrow.dataset``

If using Maven and Surefire for unit testing, :ref:`this argument must
be added to Surefire as well <java-install-maven-testing>`.
Expand Down
3 changes: 0 additions & 3 deletions docs/source/java/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,6 @@ but some modules are JNI bindings to the C++ library.
* - flight-core
- (Experimental) An RPC mechanism for transferring ValueVectors.
- Native
* - flight-grpc
- (Experimental) Contains utility class to expose Flight gRPC service and client.
- Native
* - flight-sql
- (Experimental) Contains utility classes to expose Flight SQL semantics for clients and servers over Arrow Flight.
- Native
Expand Down
Loading

0 comments on commit 92682f0

Please sign in to comment.