You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the very simple python snippet for model training via ray.tune on timeseries, the trials remain in RUNNING status and never finish. Sometimes, this is only a single trial, sometimes even more. This issue was introduced via version 2.*
CPU workload is high at the beginning but drops down very quickly.
Running this experiment several times and exiting it after not finishing, increases unused memory
What you expected to happen
The complete experiment should finish within 10sec.
Possibly related issue, but only changing the ray version fixes this described issue, therefore, I don't think they really help here:
# %%importmodin.pandasaspdimportnumpyasnpimportrayfromrayimporttuneifnotray.is_initialized():
try:
fromrayimportairray.init(num_cpus=4, runtime_env={"env_vars": {"__MODIN_AUTOIMPORT_PANDAS__": "1"}})
except (TypeError, ImportError):
ray.init(num_cpus=4)
# Create a df with a numpy array in each cell, e.g. usage for timeseries################################################################################num_timeseries=100rand_int=np.random.randint(0, 10, size=(num_timeseries))
rand_float=np.random.random_sample((num_timeseries))
num_rows=1000df=pd.DataFrame({"abc": [rand_int] *num_rows, "def": [rand_float] *num_rows})
df# %%# Run df operations in a hyperparameter tuning experiment.# It only fails/get stuck for me because of the series_diff and series_sum operations.# Apply operations, as many as is like, are not problem.################################################################################defeasy_objective(config, data):
df=data[0]
column="abc"# for column in df.columns: # if that does not fail for you, try looping over all columnsseries_min=df[column].apply(np.nanmin)
series_max=df[column].apply(np.nanmax)
series_diff=series_max-series_minseries_sum=series_max+series_min# Using the old api, as this api can be used in ray version 1.* and 2.*# with dedicated CPU for modintune.run(
tune.with_parameters(easy_objective, data=[df]),
num_samples=10,
resources_per_trial=tune.PlacementGroupFactory([{
"CPU": 1,
"GPU": 0
}, {
"CPU": 1
}], strategy="PACK"),
)
""" via new tune.Tuner api, the result stays the sametuner = tune.Tuner( tune.with_resources(tune.with_parameters(easy_objective, data=[df]), tune.PlacementGroupFactory([ { "CPU": 1, "GPU": 0 }, { "CPU": 1 }, ])), tune_config=tune.TuneConfig(num_samples=10),)tuner.fit()"""
Console Logs
```bash
2022-11-21 13:11:34,739 INFO worker.py:1519 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8283
UserWarning: Distributing object. This may take some time.
2022-11-21 13:11:34,784 WARNING function_trainable.py:586 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Current time: 2022-11-21 13:11:38 (running for 00:00:03.29)
Memory usage on this node: 24.1/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (9 PENDING, 1 RUNNING)
+----------------------------+----------+-----------------------+
| Trial name | status | loc |
|----------------------------+----------+-----------------------|
| easy_objective_a46ba_00000 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | PENDING | |
| easy_objective_a46ba_00002 | PENDING | |
| easy_objective_a46ba_00003 | PENDING | |
| easy_objective_a46ba_00004 | PENDING | |
| easy_objective_a46ba_00005 | PENDING | |
| easy_objective_a46ba_00006 | PENDING | |
| easy_objective_a46ba_00007 | PENDING | |
| easy_objective_a46ba_00008 | PENDING | |
| easy_objective_a46ba_00009 | PENDING | |
+----------------------------+----------+-----------------------+
Trial easy_objective_a46ba_00000 completed. Last result:
Trial easy_objective_a46ba_00001 completed. Last result:
Trial easy_objective_a46ba_00003 completed. Last result:
Trial easy_objective_a46ba_00004 completed. Last result:
Trial easy_objective_a46ba_00005 completed. Last result:
Trial easy_objective_a46ba_00006 completed. Last result:
Trial easy_objective_a46ba_00007 completed. Last result:
Trial easy_objective_a46ba_00008 completed. Last result:
Trial easy_objective_a46ba_00009 completed. Last result:
== Status ==
Current time: 2022-11-21 13:11:45 (running for 00:00:11.01)
Memory usage on this node: 23.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
(easy_objective pid=235885) 2022-11-21 13:12:35,958 ERROR worker.py:763 -- Worker exits with an exit code 1.
(easy_objective pid=235885) Traceback (most recent call last):
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 1032, in ray._raylet.task_execution_handler
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 812, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 852, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 859, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 863, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 810, in ray._raylet.execute_task.function_executor
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/function_manager.py", line 674, in actor_method_executor
(easy_objective pid=235885) return method(__ray_actor, *args, **kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 352, in train
(easy_objective pid=235885) result = self.step()
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 365, in step
(easy_objective pid=235885) result = self._results_queue.get(
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/queue.py", line 179, in get
(easy_objective pid=235885) self.not_empty.wait(remaining)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/threading.py", line 306, in wait
(easy_objective pid=235885) gotit = waiter.acquire(True, timeout)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/worker.py", line 760, in sigterm_handler
(easy_objective pid=235885) sys.exit(1)
(easy_objective pid=235885) SystemExit: 1
2022-11-21 13:12:36,085 ERROR tune.py:773 -- Trials did not complete: [easy_objective_a46ba_00002]
2022-11-21 13:12:36,085 INFO tune.py:777 -- Total run time: 61.30 seconds (61.03 seconds for the tuning loop).
2022-11-21 13:12:36,086 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing resume=True to tune.run()
</details>
### Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered:
ahallermed
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Nov 21, 2022
What happened + What you expected to happen
What happened
Running the very simple python snippet for model training via ray.tune on timeseries, the trials remain in RUNNING status and never finish. Sometimes, this is only a single trial, sometimes even more. This issue was introduced via version 2.*
CPU workload is high at the beginning but drops down very quickly.
Running this experiment several times and exiting it after not finishing, increases unused memory
What you expected to happen
The complete experiment should finish within 10sec.
Possibly related issue, but only changing the ray version fixes this described issue, therefore, I don't think they really help here:
tune.run()
hangs on local cluster when using a functional trainable withreuse_actors=True
#18808 + [ray] Modin on ray causes ray.tune to hang modin-project/modin#3479 (But their solution didn't solve my issue)Versions / Dependencies
Ubuntu 20.04.5 LTS
Python 3.8.10
modin==1.7.0
pandas==1.5.1
numpy==1.23.4
ray[default,tune]==...
2.1.0 -> Failure
2.0.0 -> Failure
1.12.1 -> Success
1.13.0 -> Success
Reproduction script
Console Logs
```bash 2022-11-21 13:11:34,739 INFO worker.py:1519 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8283 UserWarning: Distributing object. This may take some time. 2022-11-21 13:11:34,784 WARNING function_trainable.py:586 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`. == Status == Current time: 2022-11-21 13:11:38 (running for 00:00:03.29) Memory usage on this node: 24.1/62.6 GiB Using FIFO scheduling algorithm. Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G) Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34 Number of trials: 10/10 (9 PENDING, 1 RUNNING) +----------------------------+----------+-----------------------+ | Trial name | status | loc | |----------------------------+----------+-----------------------| | easy_objective_a46ba_00000 | RUNNING | 192.168.178.41:235885 | | easy_objective_a46ba_00001 | PENDING | | | easy_objective_a46ba_00002 | PENDING | | | easy_objective_a46ba_00003 | PENDING | | | easy_objective_a46ba_00004 | PENDING | | | easy_objective_a46ba_00005 | PENDING | | | easy_objective_a46ba_00006 | PENDING | | | easy_objective_a46ba_00007 | PENDING | | | easy_objective_a46ba_00008 | PENDING | | | easy_objective_a46ba_00009 | PENDING | | +----------------------------+----------+-----------------------+Trial easy_objective_a46ba_00000 completed. Last result:
Trial easy_objective_a46ba_00001 completed. Last result:
Trial easy_objective_a46ba_00003 completed. Last result:
Trial easy_objective_a46ba_00004 completed. Last result:
Trial easy_objective_a46ba_00005 completed. Last result:
Trial easy_objective_a46ba_00006 completed. Last result:
Trial easy_objective_a46ba_00007 completed. Last result:
Trial easy_objective_a46ba_00008 completed. Last result:
Trial easy_objective_a46ba_00009 completed. Last result:
== Status ==
Current time: 2022-11-21 13:11:45 (running for 00:00:11.01)
Memory usage on this node: 23.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:11:50 (running for 00:00:16.01)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:11:55 (running for 00:00:21.01)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:00 (running for 00:00:26.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:05 (running for 00:00:31.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:10 (running for 00:00:36.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:15 (running for 00:00:41.02)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:20 (running for 00:00:46.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:25 (running for 00:00:51.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:30 (running for 00:00:56.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
^C2022-11-21 13:12:34,879 WARNING tune.py:705 -- Stop signal received (e.g. via SIGINT/Ctrl+C), ending Ray Tune run. This will try to checkpoint the experiment state one last time. Press CTRL+C (or send SIGINT/SIGKILL/SIGTERM) to skip.
== Status ==
Current time: 2022-11-21 13:12:35 (running for 00:01:01.03)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
== Status ==
Current time: 2022-11-21 13:12:35 (running for 00:01:01.04)
Memory usage on this node: 22.9/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/4 CPUs, 0/1 GPUs, 0.0/22.81 GiB heap, 0.0/11.4 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/andreas/ray_results/easy_objective_2022-11-21_13-11-34
Number of trials: 10/10 (1 RUNNING, 9 TERMINATED)
+----------------------------+------------+-----------------------+
| Trial name | status | loc |
|----------------------------+------------+-----------------------|
| easy_objective_a46ba_00002 | RUNNING | 192.168.178.41:235885 |
| easy_objective_a46ba_00000 | TERMINATED | 192.168.178.41:235885 |
| easy_objective_a46ba_00001 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00003 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00004 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00005 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00006 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00007 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00008 | TERMINATED | 192.168.178.41:236085 |
| easy_objective_a46ba_00009 | TERMINATED | 192.168.178.41:236085 |
+----------------------------+------------+-----------------------+
PRESSING CTRL+C
(easy_objective pid=235885) 2022-11-21 13:12:35,958 ERROR worker.py:763 -- Worker exits with an exit code 1.
(easy_objective pid=235885) Traceback (most recent call last):
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 1032, in ray._raylet.task_execution_handler
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 812, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 852, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 859, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 863, in ray._raylet.execute_task
(easy_objective pid=235885) File "python/ray/_raylet.pyx", line 810, in ray._raylet.execute_task.function_executor
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/function_manager.py", line 674, in actor_method_executor
(easy_objective pid=235885) return method(__ray_actor, *args, **kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 352, in train
(easy_objective pid=235885) result = self.step()
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 466, in _resume_span
(easy_objective pid=235885) return method(self, *_args, **_kwargs)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 365, in step
(easy_objective pid=235885) result = self._results_queue.get(
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/queue.py", line 179, in get
(easy_objective pid=235885) self.not_empty.wait(remaining)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/threading.py", line 306, in wait
(easy_objective pid=235885) gotit = waiter.acquire(True, timeout)
(easy_objective pid=235885) File "/home/andreas/miniconda3/envs/elise/lib/python3.8/site-packages/ray/_private/worker.py", line 760, in sigterm_handler
(easy_objective pid=235885) sys.exit(1)
(easy_objective pid=235885) SystemExit: 1
2022-11-21 13:12:36,085 ERROR tune.py:773 -- Trials did not complete: [easy_objective_a46ba_00002]
2022-11-21 13:12:36,085 INFO tune.py:777 -- Total run time: 61.30 seconds (61.03 seconds for the tuning loop).
2022-11-21 13:12:36,086 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing
resume=True
totune.run()
The text was updated successfully, but these errors were encountered: