Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable RayDP tests since arrow version limits removed #31124

Merged
merged 11 commits into from
Feb 17, 2023

Conversation

kira-lin
Copy link
Contributor

Why are these changes needed?

Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@kira-lin
Copy link
Contributor Author

It seems like java is not installed in the instances where arrow datasets are run. What should I do? @clarkzinzow @jjyao

Signed-off-by: Zhi Lin <[email protected]>
@kira-lin
Copy link
Contributor Author

kira-lin commented Jan 3, 2023

I found that the ci images is updated here #28641, it seems that java is not installed in ml images.
I tried to modify the Dockerfile to install java, but it did not work. What should I do? @krfricke

@kira-lin kira-lin changed the title Uncomment RayDP tests since arrow version limits removed Re-enable RayDP tests since arrow version limits removed Feb 9, 2023
nightly has been merged into raydp

Signed-off-by: Zhi Lin <[email protected]>
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think there is a slight misunderstanding about the RAY_INSTALL_JAVA env variable which will build the Ray java language bindings, which we don't want in the base images (it will bloat the images and increase build time).

Instead we should install the java JDK in the 5 buildkite jobs that require java.

@@ -1,6 +1,11 @@
ARG DOCKER_IMAGE_BASE_TEST
FROM $DOCKER_IMAGE_BASE_TEST

ENV RAY_INSTALL_JAVA=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENV RAY_INSTALL_JAVA=1
ENV RAY_INSTALL_JAVA=1

Do we actually need the Ray Java language bindings here?

This will always build the Ray jars for every build job, this is probably not what we want

@@ -13,6 +13,6 @@ WORKDIR /ray
COPY . .

# Install Ray
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=0 bash --login -i -c -- "python3 -m pip install -e /ray/python/"
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=1 bash --login -i -c -- "python3 -m pip install -e /ray/python/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=1 bash --login -i -c -- "python3 -m pip install -e /ray/python/"
RUN SKIP_BAZEL_BUILD=1 RAY_INSTALL_JAVA=0 bash --login -i -c -- "python3 -m pip install -e /ray/python/"

Let's revert this, as explained above we don't want to build the Ray java jars. This will drastically increase build time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RayDP needs java binding to work though. It creates java actors. Where should we specify this option then?

Comment on lines 6 to 7
RUN apt-get install -y -qq \
maven openjdk-8-jre openjdk-8-jdk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this in the base Dockerfile, can we just install the packages in the buildkite job that runs the RayDP test? I.e. move this to .buildkite/pipeline.ml.yml and then into the Dataset tests ... jobs.

To make this a bit nicer, maybe create a file ./ci/env/install-java.sh where we just run the apt install command and refer to this in the buildkite jobs

@@ -54,7 +54,7 @@ install_base() {
curl -f -s -L -R https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - || true
sudo apt-get update -qq
pkg_install_helper build-essential curl unzip libunwind-dev python3-pip python3-setuptools \
tmux gdb
tmux gdb maven openjdk-8-jre openjdk-8-jdk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this once we install java in the buildkite jobs that require it

@krfricke krfricke self-assigned this Feb 15, 2023
@krfricke krfricke added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Feb 16, 2023
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
@kira-lin
Copy link
Contributor Author

@krfricke tests has passed, but when I try to add a ci/env/install_java.sh, tests failed due to permission denied. How to solve it?

@krfricke
Copy link
Contributor

@krfricke tests has passed, but when I try to add a ci/env/install_java.sh, tests failed due to permission denied. How to solve it?

Did you enable the execution flag with chmod +x install_java.sh on your local machine?

@kira-lin
Copy link
Contributor Author

Did you enable the execution flag with chmod +x install_java.sh on your local machine?
@krfricke thanks, the issue has been solved. Now the tests are ok, it's ready to merge

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Almost there, I have just one last question about building the Ray java jars which I think are not needed. If they are, can you point me to where they are used for reference?

WORKSPACE_DIR="${SCRIPT_DIR}/../.."

sudo apt-get install -y maven openjdk-8-jre openjdk-8-jdk
"${WORKSPACE_DIR}"/java/build-jar-multiplatform.sh linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to build the Ray java jars? We don't run any java code, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raydp create java actors. Such as here

@krfricke
Copy link
Contributor

Also, you are building on a pretty old base master branch. Could you please merge the latest upstream master into this PR?

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context! Will merge once CI passes

@krfricke krfricke removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Feb 17, 2023
@clarkzinzow clarkzinzow merged commit c7a50b4 into ray-project:master Feb 17, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ay-project#31124)

Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now.

Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request Mar 22, 2023
…ay-project#31124)

Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now.

Signed-off-by: Zhi Lin <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ay-project#31124)

Now that MLDataset has been made optional dependency, and PyArrow version limit has been removed, RayDP tests in ray can be re-enabled now.

Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: elliottower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants