Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37381: [Python][CI][Packaging] Enable ORC in Windows wheels and Appveyor CI #40609

Merged
merged 2 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ VCPKG="a42af01b72c28a8e1d7b48107b33e4f286a55ef6" # 2023.11.20 Release
# ci/docker/python-wheel-windows-vs2019.dockerfile.
# This is a workaround for our CI problem that "archery docker build" doesn't
# use pulled built images in dev/tasks/python-wheels/github.windows.yml.
PYTHON_WHEEL_WINDOWS_IMAGE_REVISION=2024-03-12
PYTHON_WHEEL_WINDOWS_IMAGE_REVISION=2024-03-19

# Use conanio/${CONAN} for "docker-compose run --rm conan". See
# https://github.com/conan-io/conan-docker-tools#readme for available
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/java_jni.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
name: AMD64 manylinux2014 Java JNI
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
timeout-minutes: 500
timeout-minutes: 90
steps:
- name: Checkout Arrow
uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.0.0
Expand Down
1 change: 1 addition & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ environment:
ARROW_BUILD_FLIGHT_SQL: "ON"
ARROW_BUILD_GANDIVA: "ON"
ARROW_GCS: "ON"
ARROW_ORC: "ON"
ARROW_S3: "ON"
GENERATOR: Ninja
PYTHON: "3.10"
Expand Down
3 changes: 2 additions & 1 deletion ci/appveyor-cpp-build.bat
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ cmake -G "%GENERATOR%" %ARROW_CMAKE_ARGS% ^
-DARROW_HDFS=ON ^
-DARROW_JSON=ON ^
-DARROW_MIMALLOC=ON ^
-DARROW_ORC=ON ^
-DARROW_ORC=%ARROW_ORC% ^
-DARROW_PARQUET=ON ^
-DARROW_S3=%ARROW_S3% ^
-DARROW_SUBSTRAIT=ON ^
Expand Down Expand Up @@ -125,6 +125,7 @@ set PYARROW_WITH_DATASET=ON
set PYARROW_WITH_FLIGHT=%ARROW_BUILD_FLIGHT%
set PYARROW_WITH_GANDIVA=%ARROW_BUILD_GANDIVA%
set PYARROW_WITH_GCS=%ARROW_GCS%
set PYARROW_WITH_ORC=%ARROW_ORC%
set PYARROW_WITH_PARQUET=ON
set PYARROW_WITH_PARQUET_ENCRYPTION=ON
set PYARROW_WITH_S3=%ARROW_S3%
Expand Down
3 changes: 3 additions & 0 deletions ci/docker/python-wheel-windows-test-vs2019.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,6 @@ RUN (if "%python%"=="3.8" setx PYTHON_VERSION "3.8.10" && setx PATH "%PATH%;C:\P
(if "%python%"=="3.12" setx PYTHON_VERSION "3.12.0" && setx PATH "%PATH%;C:\Python312;C:\Python312\Scripts")
RUN choco install -r -y --no-progress python --version=%PYTHON_VERSION%
RUN python -m pip install -U pip setuptools

# Install archiver to extract xz archives
RUN choco install --no-progress -r -y archiver
1 change: 1 addition & 0 deletions ci/docker/python-wheel-windows-vs2019.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ RUN vcpkg install \
--x-feature=flight \
--x-feature=gcs \
--x-feature=json \
--x-feature=orc \
--x-feature=parquet \
--x-feature=s3

Expand Down
2 changes: 1 addition & 1 deletion ci/scripts/python_wheel_windows_build.bat
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ set ARROW_FLIGHT=ON
set ARROW_GANDIVA=OFF
set ARROW_GCS=ON
set ARROW_HDFS=ON
set ARROW_ORC=OFF
set ARROW_ORC=ON
set ARROW_PARQUET=ON
set PARQUET_REQUIRE_ENCRYPTION=ON
set ARROW_MIMALLOC=ON
Expand Down
9 changes: 8 additions & 1 deletion ci/scripts/python_wheel_windows_test.bat
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ set PYARROW_TEST_FLIGHT=ON
set PYARROW_TEST_GANDIVA=OFF
set PYARROW_TEST_GCS=ON
set PYARROW_TEST_HDFS=ON
set PYARROW_TEST_ORC=OFF
set PYARROW_TEST_ORC=ON
set PYARROW_TEST_PARQUET=ON
set PYARROW_TEST_PARQUET_ENCRYPTION=ON
set PYARROW_TEST_SUBSTRAIT=ON
Expand Down Expand Up @@ -56,8 +56,15 @@ python -c "import pyarrow.dataset" || exit /B 1
python -c "import pyarrow.flight" || exit /B 1
python -c "import pyarrow.fs" || exit /B 1
python -c "import pyarrow.json" || exit /B 1
python -c "import pyarrow.orc" || exit /B 1
python -c "import pyarrow.parquet" || exit /B 1
python -c "import pyarrow.substrait" || exit /B 1

@rem Download IANA Timezone Database for ORC C++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ORC needs this unconditionally? Why is that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause is that when creating an ORC reader it tries to read the local timezone: https://github.com/apache/orc/blob/main/c%2B%2B/src/Reader.cc#L251 which by default searchs /usr/share/zoneinfo if env TZDIR is unset: https://github.com/apache/orc/blob/main/c%2B%2B/src/Timezone.cc#L659

So the workaround here is to download tzdb and set env TZDIR to make the test happy. I think we can improve this on the ORC side:

curl https://cygwin.osuosl.org/noarch/release/tzdata/tzdata-2024a-1.tar.xz --output tzdata.tar.xz || exit /B
mkdir %USERPROFILE%\Downloads\test\tzdata
arc unarchive tzdata.tar.xz %USERPROFILE%\Downloads\test\tzdata
set TZDIR=%USERPROFILE%\Downloads\test\tzdata\usr\share\zoneinfo

@REM Execute unittest
pytest -r s --pyargs pyarrow || exit /B 1
2 changes: 1 addition & 1 deletion cpp/src/arrow/adapters/orc/adapter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -491,7 +491,7 @@ class ORCFileReader::Impl {
if (!include_indices.empty()) {
RETURN_NOT_OK(SelectIndices(&opts, include_indices));
}
StripeInformation stripe_info({0, 0, 0, 0});
StripeInformation stripe_info{0, 0, 0, 0};
RETURN_NOT_OK(SelectStripeWithRowNumber(&opts, current_row_, &stripe_info));
ARROW_ASSIGN_OR_RAISE(auto schema, ReadSchema(opts));
std::unique_ptr<liborc::RowReader> row_reader;
Expand Down
Loading