-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Revert "[Core] Support Arrow zerocopy serialization in object… #36153
base: master
Are you sure you want to change the base?
Conversation
… store (ray-project#35110)" (ray-project#36000)" This reverts commit 822904b.
It's weird |
Not really sure, but perhaps try disabling parts of your PR to see what is causing the issue? It may be worth adding more debug logs for where OMP_NUM_THREADS is being set as well. |
python/ray/tests/test_advanced_9.py
Outdated
@@ -420,7 +420,7 @@ def test_omp_threads_set_third_party(ray_start_cluster, monkeypatch): | |||
cluster.add_node(num_cpus=4) | |||
ray.init(address=cluster.address) | |||
|
|||
@ray.remote(num_cpus=2) | |||
@ray.remote(num_cpus=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason that it broke the test I guess?
We use num_cpus
for OMP_NUM_THREADS
for a task - why did it get changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible reason that it might break is if any new import introduced in the PR eagerly imports numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, this change is for test only, and it won't affect the result since I tested before.
This PR imports pyarrow
, yet I cannot get the relationship with the break.. Could you find where the 4
comes from in the break assertation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely the 4 comes from the num_cpus
passed into init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's fixed on the current HEAD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I found it a little bit tricky since it's fixed after I moved import pyarrow
into functions..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other cases failed, and I have no idea whether theses are related..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could ignore the metric heads test - these should be fixed in master btw.
The arrow test failure looks relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it , just found WorkerCrashedError
in Arrow 6, it seems no details.
Ok, here is the original PR #35110 and gentle ping @ericl for review since the previous failed case was passed. Thanks again @rickyyx , could you help to find the fail reason of Arrow 6? I'm not sure if it's because of |
Thanks for comment, the major change is using arrow serialization for object store read and write, which is zero-copy and brings better performance. |
I got this stacktrace from one of the workers in the failed test for arrow 6
|
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
Signed-off-by: Zhi Lin <[email protected]>
@rickyyx I'm helping to fix the CI. The arrow related tests are passed. Other failures seems unrelated? |
Core tests look good now. Can you rebase with master to rerun the Java tests? It looks like
is a potential dependency issue. |
hi @jovany-wang , |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
… store (#35110)" (#36000)"
This reverts commit 822904b.
Why are these changes needed?
Last PR was reverted since
test_advanced_9
failed. However, I testedtest_advanced_9
on local device and it passed.This assert failure looks unrelated to this PR.
ray/python/ray/tests/test_advanced_9.py
Lines 426 to 430 in 0b190ee
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.