You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rickyyx opened this issue
Sep 26, 2022
· 0 comments
Labels
bugSomething that is supposed to be working; but isn'tP0Issues that should be fixed in short ordertriageNeeds triage (eg: priority, bug/not-bug, and owning component)
Traceback (most recent call last):
File "dataset/sort.py", line 165, in <module>
raise exc
File "dataset/sort.py", line 119, in <module>
ds = ds.sort(key="c_0")
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/dataset.py", line 1769, in sort
return Dataset(plan, self._epoch, self._lazy)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/dataset.py", line 217, in __init__
self._plan.execute(allow_clear_input_blocks=False)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/_internal/plan.py", line 309, in execute
blocks, clear_input_blocks, self._run_by_consumer
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/_internal/plan.py", line 672, in __call__
fn_constructor_kwargs=self.fn_constructor_kwargs,
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/_internal/compute.py", line 115, in _apply
results = map_bar.fetch_until_complete(refs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/data/_internal/progress_bar.py", line 75, in fetch_until_complete
for ref, result in zip(done, ray.get(done)):
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/worker.py", line 2281, in get
raise value
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 172.31.91.83, ID: 6733d6e3160eb040fa222d4df205885630920102a18d2db670e1581e) where the task was running was 54.89GB / 57.60GB (0.952868), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 773ef63289dd649da0d1f8b24bba631c4abbb6699262a6eb2ea342b2) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip 172.31.91.83`. To see the logs of the worker, use `ray logs worker-773ef63289dd649da0d1f8b24bba631c4abbb6699262a6eb2ea342b2*out -ip 172.31.91.83`.
Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the eviction threshold, set the environment variable `RAY_memory_usage_threshold_fraction` when starting Ray. To disable worker eviction, set the environment variable `RAY_memory_monitor_interval_ms` to zero.
fb7472f Remove RAY_RAYLET_NODE_ID (#28715) 93f911e Add API latency and call counts metrics to dashboard APIs (#28279) 66aae4c [Release Test] Make sure to delete all EBS volumes (#28707) 697df80 [Serve] [Docs] Remove incorrect output (#28708) d8c9aa7 [docs] configurable ecosystem gallery (#28662) 42874e1 [RLlib] Atari gym environments now require ale-py. (#28703) b7f0346 [AIR] Maintain dtype info in LightGBMPredictor (#28673) f6ae7ee [tune] Test background syncer serialization (#28699) 87f22e1 [ci] Requirements contains duplicate of 'starlette' (#28698) 2e7040e [Tune] [PBT] Maintain consistent Trial/TrialRunner state when pausing and resuming trial (#28511) 6530635 [ci] Fix mac pipeline (use python 2 in CI scripts) (#28695) ee2a8da [ci] Move to new hierarchical docker structure + pipeline (#28641) a3c97b4 [Doc] Revamp ray core design patterns doc [8/n]: pass large arg by value (#28660) db2ce69 [Datasets] Add initial aggregate benchmark (#28486) 9c2abf9 [KubeRay][Operator] Improve migration notes (#28672) 45d7cd2 [core] Support generators to allow tasks to return a dynamic number of objects (#28291)
Reproduction script
NA
Issue Severity
No response
The text was updated successfully, but these errors were encountered:
rickyyx
added
bug
Something that is supposed to be working; but isn't
P0
Issues that should be fixed in short order
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Sep 26, 2022
rickyyx
added this to the
Core Nightly/CI Regressions milestone
Sep 26, 2022
bugSomething that is supposed to be working; but isn'tP0Issues that should be fixed in short ordertriageNeeds triage (eg: priority, bug/not-bug, and owning component)
What happened + What you expected to happen
chaos_dataset_shuffle_sort_1tb
Versions / Dependencies
Last success: 45d7cd2
First failure: fb7472f
fb7472f Remove RAY_RAYLET_NODE_ID (#28715)
93f911e Add API latency and call counts metrics to dashboard APIs (#28279)
66aae4c [Release Test] Make sure to delete all EBS volumes (#28707)
697df80 [Serve] [Docs] Remove incorrect output (#28708)
d8c9aa7 [docs] configurable ecosystem gallery (#28662)
42874e1 [RLlib] Atari gym environments now require ale-py. (#28703)
b7f0346 [AIR] Maintain dtype info in LightGBMPredictor (#28673)
f6ae7ee [tune] Test background syncer serialization (#28699)
87f22e1 [ci] Requirements contains duplicate of 'starlette' (#28698)
2e7040e [Tune] [PBT] Maintain consistent
Trial
/TrialRunner
state when pausing and resuming trial (#28511)6530635 [ci] Fix mac pipeline (use python 2 in CI scripts) (#28695)
ee2a8da [ci] Move to new hierarchical docker structure + pipeline (#28641)
a3c97b4 [Doc] Revamp ray core design patterns doc [8/n]: pass large arg by value (#28660)
db2ce69 [Datasets] Add initial aggregate benchmark (#28486)
9c2abf9 [KubeRay][Operator] Improve migration notes (#28672)
45d7cd2 [core] Support generators to allow tasks to return a dynamic number of objects (#28291)
Reproduction script
NA
Issue Severity
No response
The text was updated successfully, but these errors were encountered: