-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] out of band serialization exception #47544
[core] out of band serialization exception #47544
Conversation
python/ray/_private/serialization.py
Outdated
"If you set the env var, the object is pinned forever in the " | ||
"lifetime of the worker process and can cause Ray object leaks." | ||
"See the trace to find where the serialization occurs: " | ||
f"{''.join(traceback.format_stack())}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, we should be able to tell the containing object from this stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't. but we can see which line triggers the serialization. This is the error msg with your repro script.
ray::f() (pid=37282, ip=127.0.0.1)
File "/Users/sangcho/work/ray/a.py", line 20, in f
return ArrowBlockAccessor.numpy_to_block(batch)
File "/Users/sangcho/work/ray/python/ray/data/_internal/arrow_block.py", line 249, in numpy_to_block
col = ArrowPythonObjectArray.from_objects(col)
File "/Users/sangcho/work/ray/python/ray/air/util/object_extensions/arrow.py", line 100, in from_objects
dumped_bytes = pickle_dumps(
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
ray._private.serialization.OufOfBandRefSerializationException: It is not allowed to serialize ray.ObjectRef 00d950ec0ccf9d2affffffffffffffffffffffff0100000002e1f505.If you want to allow serialization, set `RAY_allow_out_of_band_object_ref_serialization=1.` If you set the env var, the object is pinned forever in the lifetime of the worker process and can cause Ray object leaks.See the trace to find where the serialization occurs: File "/Users/sangcho/work/ray/python/ray/_private/workers/default_worker.py", line 289, in <module>
worker.main_loop()
File "/Users/sangcho/work/ray/a.py", line 20, in f
return ArrowBlockAccessor.numpy_to_block(batch)
File "/Users/sangcho/work/ray/python/ray/data/_internal/arrow_block.py", line 249, in numpy_to_block
col = ArrowPythonObjectArray.from_objects(col)
File "/Users/sangcho/work/ray/python/ray/air/util/object_extensions/arrow.py", line 100, in from_objects
dumped_bytes = pickle_dumps(
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also new error with capturing object ref;
2024-09-06 17:47:28,806 INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Traceback (most recent call last):
File "/Users/sangcho/work/ray/python/ray/_private/serialization.py", line 73, in pickle_dumps
return pickle.dumps(obj)
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/Users/sangcho/work/ray/python/ray/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
File "/Users/sangcho/work/ray/python/ray/_private/serialization.py", line 152, in object_ref_reducer
self.add_contained_object_ref(obj, obj.call_site())
File "/Users/sangcho/work/ray/python/ray/_private/serialization.py", line 221, in add_contained_object_ref
raise ray.exceptions.OufOfBandRefSerializationException(
ray.exceptions.OufOfBandRefSerializationException: It is not allowed to serialize ray.ObjectRef 00ffffffffffffffffffffffffffffffffffffff0100000001e1f505. If you want to allow serialization, set `RAY_allow_out_of_band_object_ref_serialization=1.` If you set the env var, the object is pinned forever in the lifetime of the worker process and can cause Ray object leaks. See the callsite and trace to find where the serialization occurs.
Callsite: Disabled. Set RAY_record_ref_creation_sites=1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/sangcho/work/ray/a.py", line 11, in <module>
ray.get(f.remote())
File "/Users/sangcho/work/ray/python/ray/remote_function.py", line 139, in _remote_proxy
return self._remote(args=args, kwargs=kwargs, **self._default_options)
File "/Users/sangcho/work/ray/python/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/Users/sangcho/work/ray/python/ray/util/tracing/tracing_helper.py", line 310, in _invocation_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/Users/sangcho/work/ray/python/ray/remote_function.py", line 304, in _remote
self._pickled_function = pickle_dumps(
File "/Users/sangcho/work/ray/python/ray/_private/serialization.py", line 81, in pickle_dumps
raise ray.exceptions.OufOfBandRefSerializationException(msg)
ray.exceptions.OufOfBandRefSerializationException: Could not serialize the function a.f:
=======================================================
Checking Serializability of <function f at 0x10bceb1f0>
=======================================================
!!! FAIL serialization: It is not allowed to serialize ray.ObjectRef 00ffffffffffffffffffffffffffffffffffffff0100000001e1f505. If you want to allow serialization, set `RAY_allow_out_of_band_object_ref_serialization=1.` If you set the env var, the object is pinned forever in the lifetime of the worker process and can cause Ray object leaks. See the callsite and trace to find where the serialization occurs.
Callsite: Disabled. Set RAY_record_ref_creation_sites=1
Detected 1 global variables. Checking serializability...
Serializing 'ref' ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001e1f505)...
!!! FAIL serialization: It is not allowed to serialize ray.ObjectRef 00ffffffffffffffffffffffffffffffffffffff0100000001e1f505. If you want to allow serialization, set `RAY_allow_out_of_band_object_ref_serialization=1.` If you set the env var, the object is pinned forever in the lifetime of the worker process and can cause Ray object leaks. See the callsite and trace to find where the serialization occurs.
Callsite: Disabled. Set RAY_record_ref_creation_sites=1
WARNING: Did not find non-serializable object in ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001e1f505). This may be an oversight.
=======================================================
Variable:
FailTuple(ref [obj=ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001e1f505), parent=<function f at 0x10bceb1f0>])
was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
=======================================================
Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
=======================================================
Signed-off-by: Hao Chen <[email protected]>
This reverts commit 192cc31.
Signed-off-by: Hao Chen <[email protected]>
python/ray/_private/serialization.py
Outdated
@@ -127,7 +136,7 @@ def actor_handle_reducer(obj): | |||
serialized, actor_handle_id, weak_ref = obj._serialization_helper() | |||
# Update ref counting for the actor handle | |||
if not weak_ref: | |||
self.add_contained_object_ref(actor_handle_id) | |||
self.add_contained_object_ref(actor_handle_id, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's always True for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it crashes so many tests now. And I think the leak is probably very minimal for actor handle. I will add comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it doesn't leak actual actors)
…com:rkooo567/ray into raise-warning-for-out-of-band-serialization
need approval from the data team. cc @raulchen |
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Introduce an env var to raise an exception when there's out of band seriailzation of object ref Improve error message on out of band serialization issue. There are 2 types of issues. 1. cloudpikcle.dumps(ref). 2. implicit capture. See below for more details. Update an anti-pattern doc. Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
This PR
This PR is backward compatible
cloudpickle.dumps error message
implicit capture error
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.