Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IoT Edge Agent randomly stops and fails to restart on Ubuntu Core #7333

Open
kaancfidan opened this issue Aug 1, 2024 · 3 comments
Open
Assignees

Comments

@kaancfidan
Copy link

Expected Behavior

Azure IoT Edge Agent should keep running unattended for long periods of time.

Current Behavior

Edge agent and all the other Docker containers deployed on the machine including Edge Hub stop running after a few days. When we SSH into the device to look at the status, we see that all docker containers have stopped and we cannot restart them. The issue is only solved when we uninstall, reinstall and config azure-iot-edge from scratch.

When we try to restart edgeAgent container we see the following error:

~$ sudo docker restart edgeAgent
Error response from daemon: Cannot restart container edgeAgent: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/run/iotedge/mgmt.sock" to rootfs at "/var/run/iotedge/mgmt.sock": mount /var/run/iotedge/mgmt.sock:/var/run/iotedge/mgmt.sock (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type

~$ sudo stat /var/run/iotedge/mgmt.sock
  File: /var/run/iotedge/mgmt.sock
  Size: 40              Blocks: 0          IO Block: 4096   directory
Device: 1ch/28d Inode: 5343        Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-08-01 13:27:44.325209014 +0000
Modify: 2024-08-01 13:27:44.325209014 +0000
Change: 2024-08-01 13:27:44.325209014 +0000
 Birth: -

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Install Azure IoT Edge using sudo snap install azure-iot-edge
  2. Configure the device and connect it to the IoT Hub using sudo snap set azure-iot-edge raw-config="$(cat config.toml)"
  3. Validate that it downloads the deployment and runs the application properly.
  4. Let it run for long enough (~1 week maybe?) and see that it randomly goes offline.
  5. SSH into the device and confirm that the IoT edge agent is down.
  6. Try sudo docker restart edgeAgent and get the error mentioned above.

Context (Environment)

~$ uname -a
Linux MiWiFi-R4AC-srv 5.15.0-1058-raspi #61-Ubuntu SMP PREEMPT Mon Jul 1 07:22:01 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

~$ lscpu
Architecture:             aarch64
  CPU op-mode(s):         32-bit, 64-bit
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                ARM
  Model name:             Cortex-A72
    Model:                3
    Thread(s) per core:   1
    Core(s) per cluster:  4
    Socket(s):            -
    Cluster(s):           1
    Stepping:             r0p3
    CPU max MHz:          1800.0000
    CPU min MHz:          600.0000
    BogoMIPS:             108.00
    Flags:                fp asimd evtstrm crc32 cpuid
Caches (sum of all):
  L1d:                    128 KiB (4 instances)
  L1i:                    192 KiB (4 instances)
  L2:                     1 MiB (1 instance)
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Vulnerable
  Spectre v1:             Mitigation; __user pointer sanitization
  Spectre v2:             Vulnerable
  Srbds:                  Not affected
  Tsx async abort:        Not affected

Device Information

  • Host OS [e.g. Ubuntu 22.04, Windows Server IoT 2019]: Ubuntu Core 22
  • Architecture [e.g. amd64, arm32, arm64]: arm64
  • Container OS [e.g. Linux containers, Windows containers]: Linux containers

Runtime Versions

  • aziot-edged [run iotedge version]: iotedge 1.5.0
  • Edge Agent [image tag (e.g. 1.0.0)]: mcr.microsoft.com/azureiotedge-agent:1.4
  • Edge Hub [image tag (e.g. 1.0.0)]: mcr.microsoft.com/azureiotedge-hub:1.4
  • Docker/Moby [run docker version]: 24.0.5
@vipeller vipeller self-assigned this Aug 1, 2024
@kaancfidan
Copy link
Author

This issue is preventing us to deploy Raspberry Pi devices.

@Muximize
Copy link

Is there a reason you use the older 1.4 agent and hub with the newer 1.5 daemon? Did you try with the latest versions?

aziot-edge:         1.5.10
azureiotedge-agent: 1.5.11
azureiotedge-hub:   1.5.11

@kaancfidan
Copy link
Author

kaancfidan commented Sep 27, 2024

Is there a reason you use the older 1.4 agent and hub with the newer 1.5 daemon? Did you try with the latest versions?


aziot-edge:         1.5.10

azureiotedge-agent: 1.5.11

azureiotedge-hub:   1.5.11

When the issue was fresh, 1.4 was the latest available in snap. I'll try upgrading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants