-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Use recent java toolchain to build Arrow #3
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
cf8afe1
to
357b498
Compare
…ssageSerializer (apache#41718) ### Rationale for this change ### What changes are included in this PR? apache#41717 describes issue and change ### Are these changes tested? CI build ### Are there any user-facing changes? * GitHub Issue: apache#41717 Authored-by: PJ Fanning <[email protected]> Signed-off-by: David Li <[email protected]>
…er sized (apache#41746) ### Rationale for this change See apache#41738. ### What changes are included in this PR? Allocate the underlying buffer of temp stack vector using padded size. ### Are these changes tested? UT included. ### Are there any user-facing changes? None. * GitHub Issue: apache#41738 Authored-by: Ruoxi Sun <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
357b498
to
f2edf49
Compare
…Column (apache#41704) ### Rationale for this change Resolves apache#41699 . ### What changes are included in this PR? Add `to_dict` method and test case ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#41699 Authored-by: Tai Le Manh <[email protected]> Signed-off-by: AlenkaF <[email protected]>
…1628) ### Rationale for this change The commit in question caused a lot of CI issues ### Are these changes tested? N/A ### Are there any user-facing changes? N/A * GitHub Issue: apache#41571 Authored-by: David Li <[email protected]> Signed-off-by: David Li <[email protected]>
…ache#41346) ### Rationale for this change In apache#41321 , user reports a corrupt when reading from a corrupt parquet file. This is because we lost some checking. Current code works on reading a normal parquet file. But when reading a corrupt file, this need to be more strict. **Currently this patch just enhance the checking on Parquet Level, the correspond value check would be add in later patches** ### What changes are included in this PR? More strict parquet checkings on Level ### Are these changes tested? Already exists test, maybe we can introduce parquet file as test file ### Are there any user-facing changes? More strict checkings * GitHub Issue: apache#41321 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Signed-off-by: mwish <[email protected]>
…Type in terms of the storage type (apache#41413) ### Rationale for this change This update aligns the Python API with Arrow C++ by exposing the actual byte and bit widths of extension types from their storage type. ### What changes are included in this PR? - Expose byte_width and bit_width properties for ExtensionType in Python, reflecting the underlying storage type. - Add unit tests to verify these properties ### Are these changes tested? Yes ### Are there any user-facing changes? Yes * GitHub Issue: apache#41389 Lead-authored-by: Hyunseok Seo <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
apache#41743) Bumps [github.com/hamba/avro/v2](https://github.com/hamba/avro) from 2.21.1 to 2.22.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/hamba/avro/releases">github.com/hamba/avro/v2's releases</a>.</em></p> <blockquote> <h2>v2.22.0</h2> <h2>What's Changed</h2> <ul> <li>allow custom template by <a href="https://github.com/adrianiacobghiula"><code>@adrianiacobghiula</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/392">hamba/avro#392</a></li> <li>chore: bump linter version by <a href="https://github.com/nrwiersma"><code>@nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/393">hamba/avro#393</a></li> <li>feat: make schema tests parallel by <a href="https://github.com/nrwiersma"><code>@nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/394">hamba/avro#394</a></li> <li>feat: allow strict type generation with avrogen by <a href="https://github.com/nrwiersma"><code>@nrwiersma</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/396">hamba/avro#396</a></li> <li>chore: bump golang.org/x/tools from 0.20.0 to 0.21.0 in the all group by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/397">hamba/avro#397</a></li> <li>chore: bump the all group with 2 updates by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/hamba/avro/pull/398">hamba/avro#398</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/adrianiacobghiula"><code>@adrianiacobghiula</code></a> made their first contribution in <a href="https://redirect.github.com/hamba/avro/pull/392">hamba/avro#392</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/hamba/avro/compare/v2.21.1...v2.22.0">https://github.com/hamba/avro/compare/v2.21.1...v2.22.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/hamba/avro/commit/dc715a44a6e3dc4e2c8369528de01b93f23c0466"><code>dc715a4</code></a> chore: bump the all group with 2 updates (<a href="https://redirect.github.com/hamba/avro/issues/398">#398</a>)</li> <li><a href="https://github.com/hamba/avro/commit/78423c1d8bbabb95aa162659cb7915d4a246e518"><code>78423c1</code></a> chore: bump golang.org/x/tools from 0.20.0 to 0.21.0 in the all group (<a href="https://redirect.github.com/hamba/avro/issues/397">#397</a>)</li> <li><a href="https://github.com/hamba/avro/commit/ee51df91b939351dbfde00e63735efa796d4ac0a"><code>ee51df9</code></a> feat: allow strict type generation with avrogen (<a href="https://redirect.github.com/hamba/avro/issues/396">#396</a>)</li> <li><a href="https://github.com/hamba/avro/commit/0e4c8f96537226fa1eebba258fb2691a87b41fc5"><code>0e4c8f9</code></a> feat: make schema tests parallel (<a href="https://redirect.github.com/hamba/avro/issues/394">#394</a>)</li> <li><a href="https://github.com/hamba/avro/commit/9b663e13601288e0cc79497592fb99d8ef4eb096"><code>9b663e1</code></a> chore: bump linter version (<a href="https://redirect.github.com/hamba/avro/issues/393">#393</a>)</li> <li><a href="https://github.com/hamba/avro/commit/f17a0013fdf783786958c276b690953405565583"><code>f17a001</code></a> feat: allow custom template in avrogen (<a href="https://redirect.github.com/hamba/avro/issues/392">#392</a>)</li> <li>See full diff in <a href="https://github.com/hamba/avro/compare/v2.21.1...v2.22.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/hamba/avro/v2&package-manager=go_modules&previous-version=2.21.1&new-version=2.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@ dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@ dependabot rebase` will rebase this PR - `@ dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@ dependabot merge` will merge this PR after your CI passes on it - `@ dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@ dependabot cancel merge` will cancel a previously requested merge and block automerging - `@ dependabot reopen` will reopen this PR if it is closed - `@ dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@ dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@ dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@ dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Matt Topol <[email protected]>
…nce regression (apache#41036) ### Rationale for this change Add a grouper benchmark for preventing performance regression . apache#40998 (comment). ### What changes are included in this PR? Added a benchmark. ### Are these changes tested? Needn't. ### Are there any user-facing changes? No * GitHub Issue: apache#41035 Authored-by: ZhangHuiGui <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
e59a1fe
to
3464538
Compare
…ache#41629) ### Rationale for this change See issue. ### What changes are included in this PR? Enforce usage of binary and install of cran openssl version on intel and arm macos. ### Are these changes tested? Crossbow * GitHub Issue: apache#41426 Authored-by: Jacob Wujciak-Jens <[email protected]> Signed-off-by: Jacob Wujciak-Jens <[email protected]>
) ### Rationale for this change The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both. ### What changes are included in this PR? Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`. ### Are these changes tested? Test writing in progress, wanted to get this up for review while I write tests. ### Are there any user-facing changes? No, only new functions have been added. * GitHub Issue: apache#40078 Lead-authored-by: Matt Topol <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Matt Topol <[email protected]>
fece25e
to
649ded2
Compare
… library (apache#41721) ### Rationale for this change This is to support later using the `*_AVAILABLE_IN_*` macros to add `dllexport/dllimport` attributes required for building these libraries with MSVC (apache#41134) ### What changes are included in this PR? * Add a Python script that generates `DEPRECATED_IN` and `AVAILABLE_IN` macros for each GLib library * Add missing `AVAILABLE_IN` annotations to some methods in the GLib libraries (except the main arrow-glib library as this is being done in apache#41599) ### Are these changes tested? This doesn't include any behaviour change that can be unit tested. ### Are there any user-facing changes? No * GitHub Issue: apache#41681 Lead-authored-by: Adam Reeve <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…n in write_table() docstring (apache#41759) ### Rationale for this change In PR apache#40094 (issue apacheGH-39978), we forgot to update the `write_table` docstring with an accurate description of the supported data types for BYTE_STREAM_SPLIT. ### Are these changes tested? No (only a doc change). ### Are there any user-facing changes? No. * GitHub Issue: apache#41748 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…apache#41761) Following the discussions on the Parquet ML (see [this thread](https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo) and [this thread](https://lists.apache.org/thread/vs3w2z5bk6s3c975rrkqdttr1dpsdn7h)), and the various complaints about poor Parquet metadata performance on wide schemas, this adds a benchmark to measure the overhead of Parquet file metadata parsing or serialization for different numbers of row groups and columns. Sample output: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------------------------- WriteFileMetadataAndData/num_columns:1/num_row_groups:1 11743 ns 11741 ns 59930 data_size=54 file_size=290 items_per_second=85.1726k/s WriteFileMetadataAndData/num_columns:1/num_row_groups:100 843137 ns 842920 ns 832 data_size=5.4k file_size=20.486k items_per_second=1.18635k/s WriteFileMetadataAndData/num_columns:1/num_row_groups:1000 8232304 ns 8230294 ns 85 data_size=54k file_size=207.687k items_per_second=121.502/s WriteFileMetadataAndData/num_columns:10/num_row_groups:1 101214 ns 101190 ns 6910 data_size=540 file_size=2.11k items_per_second=9.8824k/s WriteFileMetadataAndData/num_columns:10/num_row_groups:100 8026185 ns 8024361 ns 87 data_size=54k file_size=193.673k items_per_second=124.621/s WriteFileMetadataAndData/num_columns:10/num_row_groups:1000 81370293 ns 81343455 ns 8 data_size=540k file_size=1.94392M items_per_second=12.2936/s WriteFileMetadataAndData/num_columns:100/num_row_groups:1 955862 ns 955528 ns 733 data_size=5.4k file_size=20.694k items_per_second=1.04654k/s WriteFileMetadataAndData/num_columns:100/num_row_groups:100 80115516 ns 80086117 ns 9 data_size=540k file_size=1.94729M items_per_second=12.4866/s WriteFileMetadataAndData/num_columns:100/num_row_groups:1000 856428565 ns 856065370 ns 1 data_size=5.4M file_size=19.7673M items_per_second=1.16814/s WriteFileMetadataAndData/num_columns:1000/num_row_groups:1 9330003 ns 9327439 ns 75 data_size=54k file_size=211.499k items_per_second=107.211/s WriteFileMetadataAndData/num_columns:1000/num_row_groups:100 834609159 ns 834354590 ns 1 data_size=5.4M file_size=19.9623M items_per_second=1.19853/s ReadFileMetadata/num_columns:1/num_row_groups:1 3824 ns 3824 ns 182381 data_size=54 file_size=290 items_per_second=261.518k/s ReadFileMetadata/num_columns:1/num_row_groups:100 88519 ns 88504 ns 7879 data_size=5.4k file_size=20.486k items_per_second=11.299k/s ReadFileMetadata/num_columns:1/num_row_groups:1000 849558 ns 849391 ns 825 data_size=54k file_size=207.687k items_per_second=1.17731k/s ReadFileMetadata/num_columns:10/num_row_groups:1 19918 ns 19915 ns 35449 data_size=540 file_size=2.11k items_per_second=50.2138k/s ReadFileMetadata/num_columns:10/num_row_groups:100 715822 ns 715667 ns 975 data_size=54k file_size=193.673k items_per_second=1.3973k/s ReadFileMetadata/num_columns:10/num_row_groups:1000 7017008 ns 7015432 ns 100 data_size=540k file_size=1.94392M items_per_second=142.543/s ReadFileMetadata/num_columns:100/num_row_groups:1 175988 ns 175944 ns 3958 data_size=5.4k file_size=20.694k items_per_second=5.68363k/s ReadFileMetadata/num_columns:100/num_row_groups:100 6814382 ns 6812781 ns 103 data_size=540k file_size=1.94729M items_per_second=146.783/s ReadFileMetadata/num_columns:100/num_row_groups:1000 77858645 ns 77822157 ns 9 data_size=5.4M file_size=19.7673M items_per_second=12.8498/s ReadFileMetadata/num_columns:1000/num_row_groups:1 1670001 ns 1669563 ns 419 data_size=54k file_size=211.499k items_per_second=598.959/s ReadFileMetadata/num_columns:1000/num_row_groups:100 77339599 ns 77292924 ns 9 data_size=5.4M file_size=19.9623M items_per_second=12.9378/s ``` * GitHub Issue: apache#41760 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
) ### Rationale for this change I work on MSVC's STL, and we regularly build popular open-source projects, including yours, with development builds of the MSVC toolset. This allows us to find and fix toolset regressions before they affect users, and also allows us to provide advance notice of breaking changes, which is the case here. We recently merged microsoft/STL#4633 which will ship in VS 2022 17.11 Preview 3. This improved build throughput by refactoring `<string_view>` so that it no longer drags in `std::string`. It's also a source-breaking change for code that wasn't properly including `<string>`. Your `cpp/src/arrow/json/object_writer.h` declares `std::string Serialize();` without including `<string>`. When built with our updated STL, this will emit a compiler error: ``` C:\gitP\apache\arrow\cpp\src\arrow/json/object_writer.h(39): error C2039: 'string': is not a member of 'std' ``` ### What changes are included in this PR? The fix is simple and portable: include the necessary header. ### Are these changes tested? Nope, I'm totally YOLOing it. If it builds, it's good. (This will be tested in MSVC's internal "Real World Code" test infrastructure. Also, after VS 2022 17.11 ships, your existing build/test coverage will ensure that this keeps compiling.) ### Are there any user-facing changes? No. Authored-by: Stephan T. Lavavej <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…raries (apache#41658) ### Rationale for this change * We don't need an object library for a shared library with `ARROW_BUILD_SHARED=OFF`. * We don't need an object library for a static library with `ARROW_BUILD_STATIC=OFF`. ### What changes are included in this PR? Don't build needless object libraries based on `ARROW_BUILD_SHARED`/`ARROW_BUILD_STATIC`. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#41652 Authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…pache#41772) Use/update Maven modules to `org.apache:apache:31` and clean up Maven modules to remove unnecessary configuration or outdated workarounds * Add `org.apache:apache:31` to `org.apache.arrow:arrow-bom` and `org.apache.arrow.maven.plugins:arrow-maven-plugins` to make them conformant with ASF standards * Update `org.apache.arrow:arrow-java-root` parent to `org.apache:parent:31` * Use `version.*` and other properties to override plugin versions defined by `org.apache:parent` * Move standalone plugin versions under pluginManagement at the top level * Cleanup redundant plugin version or configuration declaration * Update `maven-dependency-plugin` to 3.6.1 and add the required overrides when necessary * Update `maven-shade-plugin` to 3.5.1 (via `org.apache:parent`) - disable reduced dependency pom creation for non-terminal modules * Remove enforcer check for java and maven version (handled by `org.apache:parent`) * Remove unnecessary `mvnrepository` link comments * Remove `m2e.version` property check in profiles (only needed for errorprone plugin configuration which is incompatible with M2E) * Cleanup `argLine` overrides for surefire/failsafe plugins * Remove unnecessary `../pom.xml` `<relativePath>` directives * Remove source/target/encoding configuration properties for `maven-compiler-plugin`, `maven-javadoc-plugin` and `maven-resources-plugin` as it is handled by `org.apache:parent` and plugin themselves * Remove unnecessary copy of codegen templates in `arrow-vector` module * Remove unnecessary junit jupiter engine dependencies for surefire/failsafe plugins. * GitHub Issue: apache#41307 Lead-authored-by: Laurent Goujon <[email protected]> Co-authored-by: Laurent Goujon <[email protected]> Signed-off-by: David Li <[email protected]>
Switch to org.apache:apache:32
Update scripts and actions to use Java 21/22 to build but a different JDK for testing
Now that TestOpens may be run with a different JVM than the one running Maven, check JVM version at runtime to see if current version is 16 or higher
Change miminum java build version to 21. This doesn't change the minimum version to use Arrow which is still Java 8 Update docker images for java jni/conda integration tests to use recent python image in order to install java 21
649ded2
to
012b40a
Compare
3cbf328
to
e7fc520
Compare
No description provided.