-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accidentally dora command is unresponsive and stuck #253
Comments
Could you give us more details on how to reproduce this issue? |
I think I can reproduce the issue with: dora up
# started dora coordinator
# started dora daemon
dora destroy
# Send destroy command to dora-coordinator
dora up # <--- This hangs I think it is due the coordinator waiting for something which makes it unable to respond to other request. |
Hmm, I tried it multiple times but I couldn't reproduce the issue on the |
Yep, I think, I will investigate on my end if you cannot reproduce. I used the main branch. |
Thanks! |
I think it's probably linked to the operator yolov5 not accessing github being the GFW, stucking the initialisation function. But it's going to be very hard for Philipp to reproduce. |
Having retested this issue, this is the stack trace: (base) ~/D/C/dora ❯❯❯ RUST_LOG=trace dora destroy (base) fix-coordinator-loop ✭
2023-04-25T08:21:11.484181Z TRACE dora_coordinator::control: Control connection closed
at binaries/coordinator/src/control.rs:90
2023-04-25T08:21:11.484197Z TRACE dora_coordinator: Handling event Control(IncomingRequest { request: Destroy, reply_sender: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) } })
at binaries/coordinator/src/lib.rs:142
2023-04-25T08:21:11.484227Z INFO dora_coordinator: Received destroy command
at binaries/coordinator/src/lib.rs:403
2023-04-25T08:21:11.484359Z INFO dora_daemon: received destroy command -> exiting
at binaries/daemon/src/lib.rs:331
in dora_daemon::run_inner with self.machine_id:
Send destroy command to dora-coordinator
2023-04-25T08:21:11.484604Z TRACE dora_coordinator::control: Control connection closed
at binaries/coordinator/src/control.rs:90 It seems to be due to this TRACE: But, looking at running process the This is probably linked to an error on sending a confirmation of the dora daemon to the coordinator to have been successfully destroyed. |
This is expected, as the CLI closes it's control connection to the coordinator when it exits. |
This seems to be the real issue here. The python operator seems to require GLIBCXX_3.4.29 (required by matplotlib), but it is not found. This error brings down the whole runtime node. I'm not sure why the daemon does not detect this error, but my guess is that it is stuck waiting for the node to finish initialization (for the synchronized start introduced in #236). So I think there are two things that we need to look into:
|
I opened #271 to track this issue: Why doesn't the dora daemon detect the operator/node initialization error? |
Does this issue still happen on the latest version (i.e. with #271 merged)? |
The situation described above has not happened again, but there is still a situation where dora stop cannot stop dataflow. This problem occurs because an exception occurs inside an operator that dataflow depends on, as shown below: (dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [YOLOv8] 4aba7bb7-7966-4839-921d-72c575f7ea33
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora stop
> Choose dataflow to stop: [YOLOv8] 4aba7bb7-7966-4839-921d-72c575f7ea33
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [YOLOv8] 4aba7bb7-7966-4839-921d-72c575f7ea33
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora -V
dora-cli 0.2.3
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$
vi webcam_yolov8.yaml nodes:
- id: webcam
operator:
python: ../../operators/webcam_op.py
inputs:
tick: dora/timer/millis/100
outputs:
- image
env:
DEVICE_INDEX: 2
- id: yolov8
operator:
outputs:
- bbox
inputs:
image: webcam/image
python: ../../operators/yolov8_op.py
env:
PYTORCH_DEVICE: "cuda"
# YOLOV8_PATH: $DORA_DEP_HOME/dependencies/YOLOv8/
# YOLOV8_WEIGHT_PATH: $DORA_DEP_HOME/dependencies/YOLOv8/weights/yolov8n.pt
- id: plot
operator:
python: ../../operators/plot.py
inputs:
image: webcam/image
obstacles_bbox: yolov8/bbox (dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ RUST_LOG=true dora start graphs/tutorials/webcam_yolov8.yaml --attach --hot-reload --name YOLOv8
4aba7bb7-7966-4839-921d-72c575f7ea33
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora logs 4aba7bb7-7966-4839-921d-72c575f7ea33 yolov8 ...
─────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Logs from yolov8.
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Ultralytics YOLOv8.0.122 🚀 Python-3.7.16 torch-1.11.0 CUDA:0 (NVIDIA GeForce RTX 3080 Ti, 12037MiB)
2 │ YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs
3 │ ^Mval: Scanning /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/labels/val2017.cache... 0 images, 0 backgrounds, 5000 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]^Mva
l: Scanning /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/labels/val2017.cache... 0 images, 0 backgrounds, 5000 corrupt: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
4 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000139.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000139.jpg'
5 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000285.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000285.jpg'
6 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000632.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000632.jpg'
7 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000724.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000724.jpg'
8 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000776.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000776.jpg'
9 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000785.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000785.jpg'
10 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000802.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000802.jpg'
11 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000872.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000872.jpg'
12 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000885.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000000885.jpg'
13 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001000.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001000.jpg'
14 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001268.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001268.jpg'
15 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001296.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001296.jpg'
16 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001353.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001353.jpg'
17 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001425.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001425.jpg'
18 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001490.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001490.jpg'
19 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001503.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001503.jpg'
20 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001532.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001532.jpg'
21 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001584.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001584.jpg'
22 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001675.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001675.jpg'
23 │ val: WARNING ⚠️ /home/jarvis/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001761.jpg: ignoring corrupt image/label: [Errno 2] No such file or directory: '/home/jarv
is/coding/pyhome/mikel-brostrom/yolo_tracking/datasets/coco/images/val2017/000000001761.jpg'
... |
v0.2.3 problem still exists, dora-cli unresponsive. (dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora destroy
2023-06-27T09:32:54.865121Z WARN dora_daemon::node_communication: failed to send event to daemon
Location:
/home/runner/work/dora/dora/binaries/daemon/src/node_communication/mod.rs:490:26
at binaries/daemon/src/node_communication/mod.rs:253
2023-06-27T09:32:54.865152Z WARN dora_daemon::node_communication: failed to receive reply from daemon
Location:
/home/runner/work/dora/dora/binaries/daemon/src/node_communication/mod.rs:494:30
at binaries/daemon/src/node_communication/mod.rs:253
Send destroy command to dora-coordinator
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
To Reproduce
Environments (please complete the following information):
You need to kill the coodinator and restart it to return to normal. |
Describe the bug
Accidentally dora command is unresponsive and stuck
To Reproduce
Steps to reproduce the behavior:
dora up
,dora start dataflow.yaml
,dora stop
,dora destroy
Environments (please complete the following information):
safer-api
crate Allows a higher-level interface #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: