Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tune] Deadlock on local cluster since ray 2.* #30524

Closed
ahallermed opened this issue Nov 21, 2022 · 1 comment
Closed

[Tune] Deadlock on local cluster since ray 2.* #30524

ahallermed opened this issue Nov 21, 2022 · 1 comment
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@ahallermed
Copy link

ahallermed commented Nov 21, 2022

What happened + What you expected to happen

What happened

Running the very simple python snippet for model training via ray.tune on timeseries, the trials remain in RUNNING status and never finish. Sometimes, this is only a single trial, sometimes even more. This issue was introduced via version 2.*
CPU workload is high at the beginning but drops down very quickly.
Running this experiment several times and exiting it after not finishing, increases unused memory

What you expected to happen

The complete experiment should finish within 10sec.

Possibly related issue, but only changing the ray version fixes this described issue, therefore, I don't think they really help here:

Versions / Dependencies

Ubuntu 20.04.5 LTS
Python 3.8.10

modin==1.7.0
pandas==1.5.1
numpy==1.23.4

ray[default,tune]==...
2.1.0 -> Failure
2.0.0 -> Failure

1.12.1 -> Success
1.13.0 -> Success

Reproduction script

# %%
import modin.pandas as pd
import numpy as np
import ray
from ray import tune

if not ray.is_initialized():
    try:
        from ray import air
        ray.init(num_cpus=4, runtime_env={"env_vars": {"__MODIN_AUTOIMPORT_PANDAS__": "1"}})
    except (TypeError, ImportError):
        ray.init(num_cpus=4)

# Create a df with a numpy array in each cell, e.g. usage for timeseries
################################################################################
num_timeseries = 100
rand_int = np.random.randint(0, 10, size=(num_timeseries))
rand_float = np.random.random_sample((num_timeseries))
num_rows = 1000

df = pd.DataFrame({"abc": [rand_int] * num_rows, "def": [rand_float] * num_rows})
df


# %%
# Run df operations in a hyperparameter tuning experiment.
# It only fails/get stuck for me because of the series_diff and series_sum operations.
# Apply operations, as many as is like, are not problem.
################################################################################
def easy_objective(config, data):
    df = data[0]
    column = "abc"
    # for column in df.columns: # if that does not fail for you, try looping over all columns
    series_min = df[column].apply(np.nanmin)
    series_max = df[column].apply(np.nanmax)
    series_diff = series_max - series_min
    series_sum = series_max + series_min


# Using the old api, as this api can be used in ray version 1.* and 2.*
# with dedicated CPU for modin
tune.run(
    tune.with_parameters(easy_objective, data=[df]),
    num_samples=10,
    resources_per_trial=tune.PlacementGroupFactory([{
        "CPU": 1,
        "GPU": 0
    }, {
        "CPU": 1
    }], strategy="PACK"),
)

""" via new tune.Tuner api, the result stays the same
tuner = tune.Tuner(
    tune.with_resources(tune.with_parameters(easy_objective, data=[df]),
                        tune.PlacementGroupFactory([
                            {
                                "CPU": 1,
                                "GPU": 0
                            },
                            {
                                "CPU": 1
                            },
                        ])),
    tune_config=tune.TuneConfig(num_samples=10),
)
tuner.fit()
"""
Console Logs ```bash 2022-11-21 13:11:34,739 INFO worker.py:1519 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8283 UserWarning: Distributing object. This may take some time. 2022-11-21 13:11:34,784 WARNING function_trainable.py:586 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`. == Status == Current time: 2022-11-21 13:11:38 (running for 00:00:03.29) Memory usage on this node: 24.1/62.6 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G) Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34 Number of trials: 10/10 (9 PENDING, 1 RUNNING) +----------------------------+----------+-----------------------+ | Trial name | status | loc | |----------------------------+----------+-----------------------| | easy_objective_a46ba_00000 | RUNNING | 192.168.178.41:235885 | | easy_objective_a46ba_00001 | PENDING | | | easy_objective_a46ba_00002 | PENDING | | | easy_objective_a46ba_00003 | PENDING | | | easy_objective_a46ba_00004 | PENDING | | | easy_objective_a46ba_00005 | PENDING | | | easy_objective_a46ba_00006 | PENDING | | | easy_objective_a46ba_00007 | PENDING | | | easy_objective_a46ba_00008 | PENDING | | | easy_objective_a46ba_00009 | PENDING | | +----------------------------+----------+-----------------------+

Trial easy_objective_a46ba_00000 completed. Last result:
Trial easy_objective_a46ba_00001 completed. Last result:
Trial easy_objective_a46ba_00003 completed. Last result:
Trial easy_objective_a46ba_00004 completed. Last result:
Trial easy_objective_a46ba_00005 completed. Last result:
Trial easy_objective_a46ba_00006 completed. Last result:
Trial easy_objective_a46ba_00007 completed. Last result:
Trial easy_objective_a46ba_00008 completed. Last result:
Trial easy_objective_a46ba_00009 completed. Last result:
== Status ==
Current time: 2022-11-21 13:11:45 (running for 00:00:11.01)
Memory usage on this node: 23.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:11:50 (running for 00:00:16.01)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:11:55 (running for 00:00:21.01)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:00 (running for 00:00:26.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:05 (running for 00:00:31.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:10 (running for 00:00:36.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:15 (running for 00:00:41.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:20 (running for 00:00:46.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:25 (running for 00:00:51.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:30 (running for 00:00:56.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

^C2022-11-21 13:12:34,879 WARNING tune.py:705 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.
== Status ==
Current time: 2022-11-21 13:12:35 (running for 00:01:01.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

== Status ==
Current time: 2022-11-21 13:12:35 (running for 00:01:01.04)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+

PRESSING CTRL+C

(easy_objective pid=235885) 2022-11-21 13:12:35,958 ERROR worker.py:763 -- Worker exits with an exit code 1.
(easy_objective pid=235885) Traceback (most recent call last):
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 1032, in ray._raylet.task_execution_handler
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 812, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 852, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 859, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 863, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 810, in ray._raylet.execute_task.function_executor
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/function_manager.py", line 674, in actor_method_executor
(easy_objective pid=235885) return method(__ray_actor, *args, **kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 352, in train
(easy_objective pid=235885) result = self.step()
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 365, in step
(easy_objective pid=235885) result = self._results_queue.get(
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/queue.py", line 179, in get
(easy_objective pid=235885) self.not_empty.wait(remaining)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/threading.py", line 306, in wait
(easy_objective pid=235885) gotit = waiter.acquire(True, timeout)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/worker.py", line 760, in sigterm_handler
(easy_objective pid=235885) sys.exit(1)
(easy_objective pid=235885) SystemExit: 1
2022-11-21 13:12:36,085 ERROR tune.py:773 -- Trials did not complete: [easy_objective_a46ba_00002]
2022-11-21 13:12:36,085 INFO tune.py:777 -- Total run time: 61.30 seconds (61.03 seconds for the tuning loop).
2022-11-21 13:12:36,086 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing resume=True to tune.run()

</details>


### Issue Severity

High: It blocks me from completing my task.
@ahallermed ahallermed added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 21, 2022
@ahallermed
Copy link
Author

Issue lies within modin update from 0.15.4 to 0.16.0, but it needed a larger df to show this effect on ray==1.12.1.
See modin-project/modin#5245

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

1 participant