Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Refactor Java cookbooks to switch from JShell to Maven #350

Merged
merged 12 commits into from
May 23, 2024
34 changes: 20 additions & 14 deletions java/CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,13 @@ install needed packages via pip, to build the Java cookbook. The
dependency packages managed via pip by build scripts are found at
`requirements.txt <requirements.txt>`_.

Java Shell
^^^^^^^^^^^^^^^^^^^^^^^^^
For Java cookbook we are running these with Java Shell tool -
`JShell <https://docs.oracle.com/en/java/javase/11/jshell/introduction-jshell.html>`_

.. code-block:: bash

> java --version
java 11.0.14 2022-01-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.14+8-LTS-263)

.. code-block:: bash
Java
^^^^

> jshell --version
jshell 11.0.14
The Java cookbooks require:

- Java JDK (11+)
- Maven

Build Process
-------------------------
Expand Down Expand Up @@ -81,6 +72,21 @@ additional ``.rst`` files in the ``source`` directory. You just
need to remember to add them to the ``index.rst`` file in the
``toctree`` for them to become visible.

When run, Java code snippets are wrapped in a simple main class

.. code-block:: java

// Any imports get put here

public class Example {
public static void main (String[] args) {
// Your code gets inserted here
}
}

If your code is more complicated, you can explicitly define ``public class Example``,
the above wrapping won't happen and the code will be run as-is.

Java Sphinx Directive
=====================

Expand Down
143 changes: 99 additions & 44 deletions java/ext/javadoctest.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,16 @@
import pathlib
import subprocess
from typing import Any, Dict
import tempfile
import shutil

from sphinx.ext.doctest import (DocTestBuilder, TestcodeDirective,
TestoutputDirective, doctest, sphinx)
from sphinx.ext.doctest import (
DocTestBuilder,
TestcodeDirective,
TestoutputDirective,
doctest,
sphinx,
)
from sphinx.locale import __


Expand All @@ -34,6 +41,10 @@ def run(self):
class JavaDocTestBuilder(DocTestBuilder):
"""
Runs java test snippets in the documentation.

The code in each testcode block is insert into a template Maven project,
run through exec:java, its output captured and post-processed, and finally
compared to whatever's in the corresponding testoutput.
"""

name = "javadoctest"
Expand All @@ -45,46 +56,49 @@ class JavaDocTestBuilder(DocTestBuilder):
def compile(
self, code: str, name: str, type: str, flags: Any, dont_inherit: bool
) -> Any:
# go to project that contains all your arrow maven dependencies
path_arrow_project = pathlib.Path(__file__).parent.parent / "source" / "demo"
# create list of all arrow jar dependencies
subprocess.check_call(
[
"mvn",
"-q",
"dependency:build-classpath",
"-DincludeTypes=jar",
"-Dmdep.outputFile=.cp.tmp",
f"-Darrow.version={self.env.config.version}",
],
cwd=path_arrow_project,
text=True,
)
if not (path_arrow_project / ".cp.tmp").exists():
raise RuntimeError(
__("invalid process to create jshell dependencies library")
source_dir = pathlib.Path(__file__).parent.parent / "source" / "demo"

with tempfile.TemporaryDirectory() as project_dir:
shutil.copytree(source_dir, project_dir, dirs_exist_ok=True)

template_file_path = (
pathlib.Path(project_dir)
/ "src"
/ "main"
/ "java"
/ "org"
/ "example"
/ "Example.java"
)

# get list of all arrow jar dependencies
with open(path_arrow_project / ".cp.tmp") as f:
stdout_dependency = f.read()
if not stdout_dependency:
raise RuntimeError(
__("invalid process to list jshell dependencies library")
with open(template_file_path, "r") as infile:
template = infile.read()

filled_template = self.fill_template(template, code)

with open(template_file_path, "w") as outfile:
outfile.write(filled_template)

# Support JPMS (since Arrow 16)
modified_env = os.environ.copy()
modified_env["_JAVA_OPTIONS"] = "--add-opens=java.base/java.nio=ALL-UNNAMED"

test_proc = subprocess.Popen(
[
"mvn",
"-f",
project_dir,
"compile",
"exec:java",
"-Dexec.mainClass=org.example.Example",
],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True,
env=modified_env,
)

# execute java testing code thru jshell and read output
# JDK11 support '-' This allows the pipe to work as expected without requiring a shell
# Migrating to /dev/stdin to also support JDK9+
proc_jshell_process = subprocess.Popen(
["jshell", "-R--add-opens=java.base/java.nio=ALL-UNNAMED", "--class-path", stdout_dependency, "-s", "/dev/stdin"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
text=True,
)
out_java_arrow, err_java_arrow = proc_jshell_process.communicate(code)
if err_java_arrow:
raise RuntimeError(__("invalid process to run jshell"))
out_java_arrow, err_java_arrow = test_proc.communicate()

# continue with python logic code to do java output validation
output = f"print('''{self.clean_output(out_java_arrow)}''')"
Expand All @@ -93,12 +107,53 @@ def compile(
return compile(output, name, self.type, flags, dont_inherit)

def clean_output(self, output: str):
if output[-3:] == '-> ':
output = output[:-3]
if output[-1:] == '\n':
output = output[:-1]
output = (4*' ').join(output.split('\t'))
return output
lines = output.split("\n")

# Remove log lines from output
lines = [l for l in lines if not l.startswith("[INFO]")]
lines = [l for l in lines if not l.startswith("[WARNING]")]
lines = [l for l in lines if not l.startswith("Download")]
lines = [l for l in lines if not l.startswith("Progress")]

result = "\n".join(lines)

# Sometimes the testoutput content is smushed onto the same line as
# following log line, probably just when the testcode code doesn't print
# its own final newline. This example is the only case I found so I
# didn't pull out the re module (i.e., this works)
result = result.replace(
"[INFO] ------------------------------------------------------------------------",
"",
)

# Convert all tabs to 4 spaces, Sphinx seems to eat tabs even if we
# explicitly put them in the testoutput block so we instead modify
# the output
result = (4 * " ").join(result.split("\t"))

return result.strip()

def fill_template(self, template, code):
# Detect the special case where cookbook code is already wrapped in a
# class and just use the code as-is without wrapping it up
if code.find("public class Example") >= 0:
return template + code

# Split input code into imports and not-imports
lines = code.split("\n")
code_imports = [l for l in lines if l.startswith("import")]
code_rest = [l for l in lines if not l.startswith("import")]

pieces = [
template,
"\n".join(code_imports),
"\n\npublic class Example {\n public static void main(String[] args) {\n",
"\n".join(code_rest),
" }\n}",
]

return "\n".join(pieces)


def setup(app) -> Dict[str, Any]:
app.add_directive("testcode", JavaTestcodeDirective)
Expand Down
1 change: 1 addition & 0 deletions java/source/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,7 @@ In case we need to project only certain columns we could configure ScanOptions w
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.ipc.ArrowReader;
import java.util.Optional;

String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
String[] projection = new String[] {"name"};
Expand Down
16 changes: 14 additions & 2 deletions java/source/demo/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@
<version>1.6.1</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</build>
<repositories>
<repository>
Expand All @@ -32,8 +39,8 @@
</repository>
</repositories>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<arrow.version>15.0.2</arrow.version>
</properties>
<dependencies>
Expand Down Expand Up @@ -107,5 +114,10 @@
<artifactId>core</artifactId>
<version>0.26.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.37.0</version>
</dependency>
</dependencies>
</project>
17 changes: 17 additions & 0 deletions java/source/demo/src/main/java/org/example/Example.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package org.example;
1 change: 1 addition & 0 deletions java/source/flight.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Flight Client and Server
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;

class Dataset implements AutoCloseable {
Expand Down
4 changes: 4 additions & 0 deletions java/source/schema.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Fields are used to denote the particular columns of tabular data.
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import java.util.ArrayList;
import java.util.List;

FieldType intType = new FieldType(true, new ArrowType.Int(32, true), null);
FieldType listType = new FieldType(true, new ArrowType.List(), null);
Expand Down Expand Up @@ -118,6 +120,8 @@ In case we need to add metadata to our Field we could use:
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import java.util.HashMap;
import java.util.Map;

Map<String, String> metadata = new HashMap<>();
metadata.put("A", "Id card");
Expand Down
Loading
Loading