-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails on ppc64le architecture #4309
Comments
Hey @xdever, Going forward, this will need some way to build pyarrow wheels for ppc64. The official way to build pyarrow wheels is through crossbow (see https://github.com/apache/arrow/tree/master/dev/tasks), we reuse most of the these scripts to build the pyarrow wheels, see https://github.com/pcmoritz/arrow-build/blob/master/.travis.yml#L54. This infrastructure uses travis, so it won't work out of the box, but it is easy to run the scripts on a dedicated machine. If you follow the instructions https://github.com/apache/arrow/tree/master/python/manylinux1 on your ppc machine, it shouldn't be too hard to build the wheels (everything is dockerized). Once you have the wheels, you can replace the pip install in Line 121 in dec7c3f
Let me know if you have questions about this or run into trouble! Best wishes, |
Hi @pcmoritz, Wouldn't it make sene to for you to provide pyarrow cross compiled to PPC64? Probably I'm not the only one who whats to use ray on IBM Minsky, which is a PPC64LE. The build process is nontrivial, and I'm afraid it would prevent many people from using it. If you don't want to bother with the PPC binaries, and don't want to keep the cmake build script in the build.sh, it would be good if it could be moved to a different script or different repository, in order for people still able to build it without too much effort. Thank you, |
Hi all, Any progress on this? Building PyArrow is the most difficult thing I have ever seen on Linux... At least could you somehow provide the old script that was used to auto-build it? Thank you, |
+1 I am working with 2x IBM AC922 systems, and cannot build Ray from source on them. |
I managed to get Ray 0.7.5 working fine (even with the rest of the cluster which is x86). For a newer version, you should change the versions and commit numbers in the script. It was super difficult and took me a few days to make it work, so I made a script out of it to be able to reproduce it next time. #!/bin/bash
mkdir ~/ray_build
cd ~/ray_build
mkdir bazel_build
cd bazel_build
wget https://github.com/bazelbuild/bazel/releases/download/0.26.1/bazel-0.26.1-dist.zip
unzip bazel*
env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
cd output
export PATH=`pwd`:$PATH
cd ../../
git clone --recursive https://github.com/apache/arrow
cd arrow
git checkout 141a213a54f4979ab0b94b94928739359a2ee9ad
#git checkout tags/apache-arrow-0.14.0
git submodule update --recursive
mkdir build
cd build
cmake ../cpp -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_INSTALL_PREFIX=~/ray_build/arrow -DCMAKE_C_FLAGS=-O3 -DCMAKE_CXX_FLAGS=-O3 -DARROW_BUILD_TESTS=off -DARROW_HDFS=on -DARROW_BOOST_USE_SHARED=off -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DARROW_PYTHON=on -DARROW_PLASMA=on -DARROW_TENSORFLOW=off -DARROW_JEMALLOC=off -DARROW_WITH_BROTLI=off -DARROW_WITH_LZ4=on -DARROW_WITH_ZSTD=off -DARROW_WITH_THRIFT=ON -DARROW_PARQUET=ON -DARROW_WITH_ZLIB=ON
make -j`nproc`
make install
cd ../python
export PKG_CONFIG_PATH=~/ray_build/arrow/lib/pkgconfig:$PKG_CONFIG_PATH
export PYARROW_BUILD_TYPE='release'
export PYARROW_WITH_ORC=0
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PLASMA=1
export PYARROW_BUNDLE_ARROW_CPP=1
#export PYARROW_BUNDLE_BOOST=1
#export PYARROW_BOOST_NAMESPACE=arrow_boost
pip3 install -r requirements-wheel.txt --user
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py build_ext --inplace
SETUPTOOLS_SCM_PRETEND_VERSION="0.14.0.RAY" python3 setup.py bdist_wheel
cp dist/pyarrow*.whl ~/ray_build
cd ../../
git clone --recursive https://github.com/ray-project/ray
cd ray
git checkout tags/ray-0.7.5
git submodule update --recursive
export SKIP_PYARROW_INSTALL=1
cd python
python3 -m pip install -q --target ray/pyarrow_files ~/ray_build/pyarrow*.whl --system
python3 setup.py bdist_wheel |
@xdever Thanks for the detailed steps! I tried to build ray-0.7.5 with the above steps. But the ray build fails with the following error:
Do these steps still work for you? Or did you have to make some changes to the script recently? Also, I did some searches for that error and found this thread: #6373. So, I tried building ray-0.8.0 instead. That did not complain about the "prometheus-cpp" download failure. But it gave the following error:
Do you think maybe this issue could be due to an old version of bazel? Should I try to build a newer version of bazel? |
Same problem here, seems that https://github.com/jovany-wang/prometheus-cpp no longer exists. |
The error you are seeing when trying to build I edited @xdever's script again to bump the version, but I get a new error:
|
@felker I tried installing bazel from yum which installed bazel 1.2.1 in ubi 7.6 along with java-11-openjdk-11.0.6.10-1.el7_7 and setting JAVA_HOME with the following commands:
That made the build proceed a bit further and then fail with the following error:
I'll see if I can resolve this error. |
I also was able to bootstrap Bazel v1.1.0 on ppc64le by following the instructions here: and noticing that user clnperez fixed the Bazel build process for Power only by that version: After a few more hiccups (using an old CMake < v3.x, not having Boost installed for Arrow), I have also gotten as far as you but am stuck again. |
@pcmoritz Travis CI now supports Would it be easy for you to add it to this to your current build matrix? I am trying to set it up on a forked version of https://github.com/ray-project/arrow-build but it is challenging |
@felker Have you had any luck with the ray build? I tried with different bazel versions. With 1.2.1 and 1.0.0, I have the same observations. Not sure whether this is an issue with bazel or the build environment. |
I had misinterpreted the errors seen earlier. They were in boost's .bazel file. The issue was not with boringssl. The following changes in /root/.cache/bazel/_bazel_root/7f16b0bd7b2d7e213ac52cfc0f0101d7/external/boost/BUILD.bazel made the build proceed further:
Now, there is a compilation error in building plasma as shown below:
I'm debugging this now. |
I'm able to build ray-0.7.7 with bazel-1.1.0 now. Had to make some minor changes in the build script and bzl files in ray. I'm verifying my changes by triggering a fresh build in UBI 7.6 ppc64le. Once I'm done with that, I'll try building ray-0.8.0. |
I'm able to build ray-0.7.7 as well as ray-0.8.1 in UBI 7.6 ppc64le container now. Had to changes to the build.sh file, bazel/ray_deps_setup.bzl file. And add a ppc specific patch in thirdparty/patches/. I've validated my changes on x86 too to make sure that my changes do not break that. Thanks @xdever. Your build steps for arrow helped me to get rid of a major hurdle! Thanks @felker for your inputs on bazel version. |
@amitsadaphule that is great news! I independently got as far as you did a few comments ago by modifying While I also ended up making the same edit:
I also added Still, it is good to validate that we were both on the same track, and I was able to get to the same plasma build error. How did you fix that @amitsadaphule ? I am still stuck there. I have installed Pyarrow (tried v0.15.1 and v0.14.1) and the Arrow C++ library from the IBM WMLCE Conda channel
instead of building the |
I know very little about Bazel , but it appears that our edits to Boost's |
@felker I was able to get past the plasma build error by adding "--cxxopt=-std=gnu++0x" option to the bazel build command. Also, about nelhage/rules_boost@ebd1dd7, I've used a patch with a subset of those changes during ray 0.7.7 and 0.8.1 build. The reason being ray is using an old commit from https://github.com/nelhage/rules_boost repo. |
Great! I will try that out. Can you share the patch? I added a Travis CI |
I got it working with Ray 0.8.0 with the following changes:
where I specified the latest After these changes (and the above mentioned steps to get Bazel and Pyarrow), Thanks for your help @amitsadaphule ! |
That's great @felker ! Good that it's building properly with master from nelhage/rules_boost@67ddc50 and no patch is needed to be applied to it from ray. I'll try that as well when I try to build ray master. Also, could you check if you're able to execute the unit tests? For me, it is complaining about a couple of missing packages and most of those (grpc, tensorflow) are not readily available for ppc64le. Here's the log:
|
Both TensorFlow and gRPC are available on IBM's Conda channel, e.g.:
I also had to install the Cython examples from the Ray documentation folder, but then I was able to run the (failing) tests:
The output I got was:
Not sure how problematic this is; I have been using Ray successfully in some limited cases over the last day. |
@felker Sorry I just saw your message from above! On Python >= 3.6, Ray should be able to run without our custom version of pyarrow, and we are working towards removing that as a built in dependeny, so you shouldn't need to get that working on power pc :) |
@pcmoritz fortunately, I was able install pyarrow from IBM's Conda channel and use Still, Travis CI |
@felker Great to hear that! You should free to create a PR that adds the ppc64le build to the Ray matrix to build the wheels! |
Thanks for everyone! I'm glad to see this great progress. However, @felker be aware that in order to use ray on a mixed architecture cluster, you have to have exactly the same version of pyarrow on all of them, including the ".RAY" at the end of the version number. So probably the IBM Conda version would work only in case every machine is PPC64. Alternatively, as a hack, one could remove on the head node the version checks in services.py, function check_version_info, and hope that it will work. |
|
Thanks @felker! I'm trying to avoid switching to conda. I'll try to resolve the build issues for opencv-python first. If that doesn't not work out, the packages from conda channels that you suggested will help me get this done anyway. |
I was finally able to get opencv-python-headless and tensorflow 2.0.1 built and installed. I executed the tests on UBI 7.6 ppc64le and got the following result:
There are quite a few test failures and the test execution gets terminated at 54%.
I need to investigate this further. |
@amitsadaphule Could you pls share your changes with me..? |
Following up on the work of others in this thread, I was able to install ray on ARM64 (aarch64) and even attempt at mixed architecture (ARM64<->x86_64) distributed computing. With the latest commits, PyArrow dependency has been removed even though pyarrow on ARM was not a problem to me as I've been using it regularly with other projects. The issue with building Ray on ARM64 was with bazel rules for boost libraries, which is unfortunate as boost for ARM doesn't have any issues as such, but the build rules for ARM64 contains errors as detailed in #7184. I've made a patch to address that and written the procedure to build and install ray on ARM64.
I've opened an issue on the same at nelhage/rules_boost from where the file is obtained during the ray build process. Update: |
@JasonWayne please find the buildscript here. Please note that the script builds ray with python 3.7.3, since that was specific requirement in my case. You can use python 3.6 instead by installing rh-python36 and replace all occurrences of python3.7 with python3.6 and those of pip3.7 with pip3.6. |
Following up on the test failures, post building ray 0.7.7 on RHEL 7.6 ppc64le and installing all test execution dependencies, when I tried to execute the test cases as @felker did you get around those test case execution issues? Has anyone else experienced this before? Is there some known solution to these problems? |
No, I have not tried to get the tests to pass since #4309 (comment) |
I managed to build ray successfully for tag v8.3.0, with boost patches @heavyinfo provided. For some reasons the patches are still needed. I am able to build it for both armv7l and aarch64. |
I hade the same issue. Did you find a way to install it? |
Yes, hope this one helps: https://github.com/ppc64le/build-scripts/tree/master/pip-ray |
I suppose i cannot run it without administrator privileges, that I don't have... |
I changed the title to be about ppc64le. Is the goal to provide a ppc64le wheel, conda package, or to provide clear instructions how to build for ppc64le? |
Yes, a conda package would be perfect |
In order to provide ppc64le and aarch64 builds on conda, all the dependencies must be available. The rllib component depends on gym, which in turn depends on pygame (for Box2d) and ale-py (the Arcade Learning Environment). See the conda-forge PR for more information. Either the gym dependency should be made optional for rllib, or someone needs to put in the time to package those two libraries for gym so that gym 0.22+ can be built for conda-forge. Once that happens, migrating the packages to ppc64le and aarch64 should follow. |
System information
Describe the problem
Build fails on non-x86 architectures, because recently binary installation of pyarrow is added to build.sh, but they are available only for x86_64.
Source code / logs
The text was updated successfully, but these errors were encountered: