Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Use recent java toolchain to build Arrow #3

Closed
wants to merge 127 commits into from

Conversation

laurentgo
Copy link
Owner

No description provided.

Copy link

github-actions bot commented May 3, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@laurentgo laurentgo changed the title Laurentgo/wip update java version [Java] Use recent java toolchain to build Arrow May 6, 2024
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch 2 times, most recently from cf8afe1 to 357b498 Compare May 20, 2024 21:59
pjfanning and others added 2 commits May 21, 2024 07:56
…ssageSerializer (apache#41718)

### Rationale for this change

### What changes are included in this PR?

apache#41717 describes issue and change

### Are these changes tested?

CI build

### Are there any user-facing changes?

* GitHub Issue: apache#41717

Authored-by: PJ Fanning <[email protected]>
Signed-off-by: David Li <[email protected]>
…er sized (apache#41746)

### Rationale for this change

See apache#41738.

### What changes are included in this PR?

Allocate the underlying buffer of temp stack vector using padded size.

### Are these changes tested?

UT included.

### Are there any user-facing changes?

None.

* GitHub Issue: apache#41738

Authored-by: Ruoxi Sun <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch from 357b498 to f2edf49 Compare May 21, 2024 05:51
tlm365 and others added 6 commits May 21, 2024 10:23
…Column (apache#41704)

### Rationale for this change
Resolves apache#41699 .

### What changes are included in this PR?
Add `to_dict` method and test case

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

* GitHub Issue: apache#41699

Authored-by: Tai Le Manh <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
…1628)

### Rationale for this change

The commit in question caused a lot of CI issues

### Are these changes tested?

N/A

### Are there any user-facing changes?

N/A
* GitHub Issue: apache#41571

Authored-by: David Li <[email protected]>
Signed-off-by: David Li <[email protected]>
…ache#41346)

### Rationale for this change

In apache#41321 , user reports a corrupt when reading from a corrupt parquet file. This is because we lost some checking. Current code works on reading a normal parquet file. But when reading a corrupt file, this need to be more strict.

**Currently this patch just enhance the checking on Parquet Level, the correspond value check would be add in later patches**

### What changes are included in this PR?

More strict parquet checkings on Level

### Are these changes tested?

Already exists test, maybe we can introduce parquet file as test file

### Are there any user-facing changes?

More strict checkings

* GitHub Issue: apache#41321

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Signed-off-by: mwish <[email protected]>
…Type in terms of the storage type (apache#41413)

### Rationale for this change

This update aligns the Python API with Arrow C++ by exposing the actual byte and bit widths of extension types from their storage type.

### What changes are included in this PR?

- Expose byte_width and bit_width properties for ExtensionType in Python, reflecting the underlying storage type.
- Add  unit tests to verify these properties

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes

* GitHub Issue: apache#41389

Lead-authored-by: Hyunseok Seo <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
apache#41743)

Bumps [github.com/hamba/avro/v2](https://github.com/hamba/avro) from 2.21.1 to 2.22.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/hamba/avro/releases">github.com/hamba/avro/v2's releases</a>.</em></p>
<blockquote>
<h2>v2.22.0</h2>
<h2>What's Changed</h2>
<ul>
<li>allow custom template by <a href="https://github.com/adrianiacobghiula"><code>@​adrianiacobghiula</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/392">hamba/avro#392</a></li>
<li>chore: bump linter version by <a href="https://github.com/nrwiersma"><code>@​nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/393">hamba/avro#393</a></li>
<li>feat: make schema tests parallel by <a href="https://github.com/nrwiersma"><code>@​nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/394">hamba/avro#394</a></li>
<li>feat: allow strict type generation with avrogen by <a href="https://github.com/nrwiersma"><code>@​nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/396">hamba/avro#396</a></li>
<li>chore: bump golang.org/x/tools from 0.20.0 to 0.21.0 in the all group by <a href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/397">hamba/avro#397</a></li>
<li>chore: bump the all group with 2 updates by <a href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/398">hamba/avro#398</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/adrianiacobghiula"><code>@​adrianiacobghiula</code></a> made their first contribution in <a href="https://redirect.github.com/hamba/avro/pull/392">hamba/avro#392</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a href="https://github.com/hamba/avro/compare/v2.21.1...v2.22.0">https://github.com/hamba/avro/compare/v2.21.1...v2.22.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/hamba/avro/commit/dc715a44a6e3dc4e2c8369528de01b93f23c0466"><code>dc715a4</code></a> chore: bump the all group with 2 updates (<a href="https://redirect.github.com/hamba/avro/issues/398">#398</a>)</li>
<li><a href="https://github.com/hamba/avro/commit/78423c1d8bbabb95aa162659cb7915d4a246e518"><code>78423c1</code></a> chore: bump golang.org/x/tools from 0.20.0 to 0.21.0 in the all group (<a href="https://redirect.github.com/hamba/avro/issues/397">#397</a>)</li>
<li><a href="https://github.com/hamba/avro/commit/ee51df91b939351dbfde00e63735efa796d4ac0a"><code>ee51df9</code></a> feat: allow strict type generation with avrogen (<a href="https://redirect.github.com/hamba/avro/issues/396">#396</a>)</li>
<li><a href="https://github.com/hamba/avro/commit/0e4c8f96537226fa1eebba258fb2691a87b41fc5"><code>0e4c8f9</code></a> feat: make schema tests parallel (<a href="https://redirect.github.com/hamba/avro/issues/394">#394</a>)</li>
<li><a href="https://github.com/hamba/avro/commit/9b663e13601288e0cc79497592fb99d8ef4eb096"><code>9b663e1</code></a> chore: bump linter version (<a href="https://redirect.github.com/hamba/avro/issues/393">#393</a>)</li>
<li><a href="https://github.com/hamba/avro/commit/f17a0013fdf783786958c276b690953405565583"><code>f17a001</code></a> feat: allow custom template in avrogen (<a href="https://redirect.github.com/hamba/avro/issues/392">#392</a>)</li>
<li>See full diff in <a href="https://github.com/hamba/avro/compare/v2.21.1...v2.22.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/hamba/avro/v2&package-manager=go_modules&previous-version=2.21.1&new-version=2.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@ dependabot rebase` will rebase this PR
- `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@ dependabot merge` will merge this PR after your CI passes on it
- `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@ dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@ dependabot reopen` will reopen this PR if it is closed
- `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

</details>

Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Matt Topol <[email protected]>
…nce regression (apache#41036)

### Rationale for this change
Add a grouper benchmark for preventing performance regression . 

apache#40998 (comment).

### What changes are included in this PR?
Added a benchmark.

### Are these changes tested?
Needn't.

### Are there any user-facing changes?
No

* GitHub Issue: apache#41035

Authored-by: ZhangHuiGui <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch 3 times, most recently from e59a1fe to 3464538 Compare May 21, 2024 18:12
assignUser and others added 2 commits May 21, 2024 20:26
…ache#41629)

### Rationale for this change

See issue.

### What changes are included in this PR?

Enforce usage of binary and install of cran openssl version on intel and arm macos.

### Are these changes tested?
Crossbow
* GitHub Issue: apache#41426

Authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Jacob Wujciak-Jens <[email protected]>
)

### Rationale for this change
The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

### What changes are included in this PR?
Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`.

### Are these changes tested?
Test writing in progress, wanted to get this up for review while I write tests.

### Are there any user-facing changes?
No, only new functions have been added.

* GitHub Issue: apache#40078

Lead-authored-by: Matt Topol <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch 7 times, most recently from fece25e to 649ded2 Compare May 22, 2024 00:14
adamreeve and others added 3 commits May 22, 2024 09:47
… library (apache#41721)

### Rationale for this change

This is to support later using the `*_AVAILABLE_IN_*` macros to add `dllexport/dllimport` attributes required for building these libraries with MSVC (apache#41134)

### What changes are included in this PR?

* Add a Python script that generates `DEPRECATED_IN` and `AVAILABLE_IN` macros for each GLib library
* Add missing `AVAILABLE_IN` annotations to some methods in the GLib libraries (except the main arrow-glib library as this is being done in apache#41599)

### Are these changes tested?

This doesn't include any behaviour change that can be unit tested.

### Are there any user-facing changes?

No
* GitHub Issue: apache#41681

Lead-authored-by: Adam Reeve <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…n in write_table() docstring (apache#41759)

### Rationale for this change

In PR apache#40094 (issue apacheGH-39978), we forgot to update the `write_table` docstring with an accurate description of the supported data types for BYTE_STREAM_SPLIT.

### Are these changes tested?

No (only a doc change).

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41748

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#41761)

Following the discussions on the Parquet ML (see [this thread](https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo) and [this thread](https://lists.apache.org/thread/vs3w2z5bk6s3c975rrkqdttr1dpsdn7h)), and the various complaints about poor Parquet metadata performance on wide schemas, this adds a benchmark to measure the overhead of Parquet file metadata parsing or serialization for different numbers of row groups and columns.

Sample output:
```
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
WriteFileMetadataAndData/num_columns:1/num_row_groups:1           11743 ns        11741 ns        59930 data_size=54 file_size=290 items_per_second=85.1726k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:100        843137 ns       842920 ns          832 data_size=5.4k file_size=20.486k items_per_second=1.18635k/s
WriteFileMetadataAndData/num_columns:1/num_row_groups:1000      8232304 ns      8230294 ns           85 data_size=54k file_size=207.687k items_per_second=121.502/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1         101214 ns       101190 ns         6910 data_size=540 file_size=2.11k items_per_second=9.8824k/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:100      8026185 ns      8024361 ns           87 data_size=54k file_size=193.673k items_per_second=124.621/s
WriteFileMetadataAndData/num_columns:10/num_row_groups:1000    81370293 ns     81343455 ns            8 data_size=540k file_size=1.94392M items_per_second=12.2936/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1        955862 ns       955528 ns          733 data_size=5.4k file_size=20.694k items_per_second=1.04654k/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:100    80115516 ns     80086117 ns            9 data_size=540k file_size=1.94729M items_per_second=12.4866/s
WriteFileMetadataAndData/num_columns:100/num_row_groups:1000  856428565 ns    856065370 ns            1 data_size=5.4M file_size=19.7673M items_per_second=1.16814/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:1      9330003 ns      9327439 ns           75 data_size=54k file_size=211.499k items_per_second=107.211/s
WriteFileMetadataAndData/num_columns:1000/num_row_groups:100  834609159 ns    834354590 ns            1 data_size=5.4M file_size=19.9623M items_per_second=1.19853/s

ReadFileMetadata/num_columns:1/num_row_groups:1                    3824 ns         3824 ns       182381 data_size=54 file_size=290 items_per_second=261.518k/s
ReadFileMetadata/num_columns:1/num_row_groups:100                 88519 ns        88504 ns         7879 data_size=5.4k file_size=20.486k items_per_second=11.299k/s
ReadFileMetadata/num_columns:1/num_row_groups:1000               849558 ns       849391 ns          825 data_size=54k file_size=207.687k items_per_second=1.17731k/s
ReadFileMetadata/num_columns:10/num_row_groups:1                  19918 ns        19915 ns        35449 data_size=540 file_size=2.11k items_per_second=50.2138k/s
ReadFileMetadata/num_columns:10/num_row_groups:100               715822 ns       715667 ns          975 data_size=54k file_size=193.673k items_per_second=1.3973k/s
ReadFileMetadata/num_columns:10/num_row_groups:1000             7017008 ns      7015432 ns          100 data_size=540k file_size=1.94392M items_per_second=142.543/s
ReadFileMetadata/num_columns:100/num_row_groups:1                175988 ns       175944 ns         3958 data_size=5.4k file_size=20.694k items_per_second=5.68363k/s
ReadFileMetadata/num_columns:100/num_row_groups:100             6814382 ns      6812781 ns          103 data_size=540k file_size=1.94729M items_per_second=146.783/s
ReadFileMetadata/num_columns:100/num_row_groups:1000           77858645 ns     77822157 ns            9 data_size=5.4M file_size=19.7673M items_per_second=12.8498/s
ReadFileMetadata/num_columns:1000/num_row_groups:1              1670001 ns      1669563 ns          419 data_size=54k file_size=211.499k items_per_second=598.959/s
ReadFileMetadata/num_columns:1000/num_row_groups:100           77339599 ns     77292924 ns            9 data_size=5.4M file_size=19.9623M items_per_second=12.9378/s
```

* GitHub Issue: apache#41760

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
StephanTLavavej and others added 13 commits June 7, 2024 11:25
)

### Rationale for this change

I work on MSVC's STL, and we regularly build popular open-source projects, including yours, with development builds of the MSVC toolset. This allows us to find and fix toolset regressions before they affect users, and also allows us to provide advance notice of breaking changes, which is the case here.

We recently merged microsoft/STL#4633 which will ship in VS 2022 17.11 Preview 3. This improved build throughput by refactoring `<string_view>` so that it no longer drags in `std::string`. It's also a source-breaking change for code that wasn't properly including `<string>`. Your `cpp/src/arrow/json/object_writer.h` declares `std::string Serialize();` without including `<string>`. When built with our updated STL, this will emit a compiler error:

```
C:\gitP\apache\arrow\cpp\src\arrow/json/object_writer.h(39): error C2039: 'string': is not a member of 'std'
```

### What changes are included in this PR?

The fix is simple and portable: include the necessary header.

### Are these changes tested?

Nope, I'm totally YOLOing it. If it builds, it's good.

(This will be tested in MSVC's internal "Real World Code" test infrastructure. Also, after VS 2022 17.11 ships, your existing build/test coverage will ensure that this keeps compiling.)

### Are there any user-facing changes?

No.

Authored-by: Stephan T. Lavavej <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…raries (apache#41658)

### Rationale for this change

* We don't need an object library for a shared library with `ARROW_BUILD_SHARED=OFF`.
* We don't need an object library for a static library with `ARROW_BUILD_STATIC=OFF`.

### What changes are included in this PR?

Don't build needless object libraries based on `ARROW_BUILD_SHARED`/`ARROW_BUILD_STATIC`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#41652

Authored-by: Sutou Kouhei <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
…pache#41772)

Use/update Maven modules to `org.apache:apache:31` and clean up Maven modules to remove unnecessary configuration or outdated workarounds

* Add `org.apache:apache:31` to `org.apache.arrow:arrow-bom` and `org.apache.arrow.maven.plugins:arrow-maven-plugins` to make them conformant with ASF standards
* Update `org.apache.arrow:arrow-java-root` parent to `org.apache:parent:31`
* Use `version.*` and other properties to override plugin versions defined by `org.apache:parent`
* Move standalone plugin versions under pluginManagement at the top level
* Cleanup redundant plugin version or configuration declaration
* Update `maven-dependency-plugin` to 3.6.1 and add the required overrides when necessary
* Update `maven-shade-plugin` to 3.5.1 (via `org.apache:parent`)
  - disable reduced dependency pom creation for non-terminal modules
* Remove enforcer check for java and maven version (handled by `org.apache:parent`)
* Remove unnecessary `mvnrepository` link comments
* Remove `m2e.version` property check in profiles (only needed for errorprone plugin configuration which is incompatible with M2E)
* Cleanup `argLine` overrides for surefire/failsafe plugins
* Remove unnecessary `../pom.xml` `<relativePath>` directives
* Remove source/target/encoding configuration properties for `maven-compiler-plugin`, `maven-javadoc-plugin` and `maven-resources-plugin` as it is handled by `org.apache:parent` and plugin themselves
* Remove unnecessary copy of codegen templates in `arrow-vector` module
* Remove unnecessary junit jupiter engine dependencies for surefire/failsafe plugins.
* GitHub Issue: apache#41307

Lead-authored-by: Laurent Goujon <[email protected]>
Co-authored-by: Laurent Goujon <[email protected]>
Signed-off-by: David Li <[email protected]>
Switch to org.apache:apache:32
Update scripts and actions to use Java 21/22 to build but a different
JDK for testing
Now that TestOpens may be run with a different JVM than the one running
Maven, check JVM version at runtime to see if current version is 16 or
higher
Change miminum java build version to 21. This doesn't change the minimum
version to use Arrow which is still Java 8

Update docker images for java jni/conda integration tests to use recent
python image in order to install java 21
@laurentgo laurentgo force-pushed the laurentgo/wip-update-java-version branch from 3cbf328 to e7fc520 Compare June 7, 2024 13:10
@laurentgo laurentgo closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment