Skip to content

Commit

Permalink
Integration tests: Install specific fastparquet version.
Browse files Browse the repository at this point in the history
Fixes NVIDIA#9603.

This commit changes the integration test setup to specifically install
fastparquet-0.8.3.

Prior to this change, when the fastparquet version is not specified, the pip
install caused 0.5.0 to be installed on some nodes, e.g. on Dataproc 2.0
(with Spark 3.1.1).
The older fastparquet versions do not support reading the contents of input
directories recursively, causing the tests to fail.

Note that this change doesn't bump the version all the way to 2023.8.0, so as
to preserve compatibility with Dataproc 2.0.  v0.8.3 seems to have the broadest
support.

Signed-off-by: MithunR <[email protected]>
  • Loading branch information
mythrocks committed Nov 2, 2023
1 parent 525c73e commit 5944702
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion integration_tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ pandas
pyarrow
pytest-xdist >= 2.0.0
findspark
fastparquet >= 2023.8.0
fastparquet >= 0.8.3
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.integration.rocky
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ RUN export CUDA_VER=`echo ${CUDA_VER} | cut -d '.' -f 1,2` && \
conda install -y -c conda-forge sre_yield && \
conda clean -ay
# install pytest plugins for xdist parallel run
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet==0.8.3

# Set default java as 1.8.0
ENV JAVA_HOME "/usr/lib/jvm/java-1.8.0-openjdk"
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.integration.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ RUN export CUDA_VER=`echo ${CUDA_VER} | cut -d '.' -f 1,2` && \
conda install -y -c conda-forge sre_yield && \
conda clean -ay
# install pytest plugins for xdist parallel run
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet
RUN python -m pip install findspark pytest-xdist pytest-order fastparquet==0.8.3

RUN apt install -y inetutils-ping expect

Expand Down
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ RUN update-java-alternatives --set /usr/lib/jvm/java-1.8.0-openjdk-amd64

RUN ln -sfn /usr/bin/python3.8 /usr/bin/python
RUN ln -sfn /usr/bin/python3.8 /usr/bin/python3
RUN python -m pip install pytest sre_yield requests pandas pyarrow findspark pytest-xdist pre-commit pytest-order fastparquet
RUN python -m pip install pytest sre_yield requests pandas pyarrow findspark pytest-xdist pre-commit pytest-order fastparquet==0.8.3

# libnuma1 and libgomp1 are required by ucx packaging
RUN apt install -y inetutils-ping expect wget libnuma1 libgomp1
Expand Down
2 changes: 1 addition & 1 deletion jenkins/databricks/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ $PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES pytest sre_yield r

# Install fastparquet (and numpy as its dependency).
$PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES numpy
$PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES fastparquet
$PYSPARK_PYTHON -m pip install --target $PYTHON_SITE_PACKAGES fastparquet==0.8.3

0 comments on commit 5944702

Please sign in to comment.