-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable RayDP tests since arrow version limits removed #31124
Conversation
It seems like java is not installed in the instances where arrow datasets are run. What should I do? @clarkzinzow @jjyao |
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
nightly has been merged into raydp Signed-off-by: Zhi Lin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think there is a slight misunderstanding about the RAY_INSTALL_JAVA env variable which will build the Ray java language bindings, which we don't want in the base images (it will bloat the images and increase build time).
Instead we should install the java JDK in the 5 buildkite jobs that require java.
ci/docker/base.ml.Dockerfile
Outdated
@@ -1,6 +1,11 @@ | |||
ARG DOCKER_IMAGE_BASE_TEST | |||
FROM $DOCKER_IMAGE_BASE_TEST | |||
|
|||
ENV RAY_INSTALL_JAVA=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENV RAY_INSTALL_JAVA=1 | |
ENV RAY_INSTALL_JAVA=1 |
Do we actually need the Ray Java language bindings here?
This will always build the Ray jars for every build job, this is probably not what we want
ci/docker/ml.Dockerfile
Outdated
@@ -13,6 +13,6 @@ WORKDIR /ray | |||
COPY . . | |||
|
|||
# Install Ray | |||
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=0 bash --login -i -c -- "python3 -m pip install -e /ray/python/" | |||
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=1 bash --login -i -c -- "python3 -m pip install -e /ray/python/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=1 bash --login -i -c -- "python3 -m pip install -e /ray/python/" | |
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=0 bash --login -i -c -- "python3 -m pip install -e /ray/python/" |
Let's revert this, as explained above we don't want to build the Ray java jars. This will drastically increase build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RayDP needs java binding to work though. It creates java actors. Where should we specify this option then?
ci/docker/base.ml.Dockerfile
Outdated
RUN apt-get install -y -qq \ | ||
maven openjdk-8-jre openjdk-8-jdk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this in the base Dockerfile, can we just install the packages in the buildkite job that runs the RayDP test? I.e. move this to .buildkite/pipeline.ml.yml
and then into the Dataset tests ...
jobs.
To make this a bit nicer, maybe create a file ./ci/env/install-java.sh
where we just run the apt install command and refer to this in the buildkite jobs
ci/env/install-dependencies.sh
Outdated
@@ -54,7 +54,7 @@ install_base() { | |||
curl -f -s -L -R https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - || true | |||
sudo apt-get update -qq | |||
pkg_install_helper build-essential curl unzip libunwind-dev python3-pip python3-setuptools \ | |||
tmux gdb | |||
tmux gdb maven openjdk-8-jre openjdk-8-jdk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this once we install java in the buildkite jobs that require it
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
@krfricke tests has passed, but when I try to add a |
Did you enable the execution flag with |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! Almost there, I have just one last question about building the Ray java jars which I think are not needed. If they are, can you point me to where they are used for reference?
WORKSPACE_DIR="${SCRIPT_DIR}/../.." | ||
|
||
sudo apt-get install -y maven openjdk-8-jre openjdk-8-jdk | ||
"${WORKSPACE_DIR}"/java/build-jar-multiplatform.sh linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to build the Ray java jars? We don't run any java code, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raydp create java actors. Such as here
Also, you are building on a pretty old base master branch. Could you please merge the latest upstream master into this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context! Will merge once CI passes
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
…ay-project#31124) Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now. Signed-off-by: Zhi Lin <[email protected]> Signed-off-by: Edward Oakes <[email protected]>
…ay-project#31124) Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now. Signed-off-by: Zhi Lin <[email protected]>
…ay-project#31124) Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now. Signed-off-by: Zhi Lin <[email protected]> Signed-off-by: elliottower <[email protected]>
Why are these changes needed?
Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.