Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty or invalid .../containerd/daemon/io.containerd.grpc.v1.introspection/uuid file created occasionally #322

Open
lmbarros opened this issue Dec 12, 2022 · 6 comments

Comments

@lmbarros
Copy link
Contributor

In some rare cases, the /mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid created by containerd is empty (instead of containing an UUID). This causes some containerd operations to fail. Notably, the following command (which happens to be part of the current balenaEngine health check)

# balena-engine-containerd-ctr --address "/run/balena-engine/containerd/balena-engine-containerd.sock" version

will fail with

invalid UUID length: 0: unknown
@jellyfish-bot
Copy link

[lmbarros] This has attached https://jel.ly.fish/e28ad781-ab1e-459d-b017-935900ccd76c

lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 15, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This PR addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event and
create a "breadcrumb" file at `/mnt/data/engine-healthcheck/` to help us
confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 15, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This PR addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event and
create a "breadcrumb" file at `/mnt/data/engine-healthcheck/` to help us
confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 15, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event and
create a "breadcrumb" file at `/mnt/data/engine-healthcheck/` to help us
confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 16, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 19, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 19, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 20, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 20, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Dec 21, 2022
In rare cases (believed to be caused by a non-atomic file creation and
writing operation in containerd), we end up with an empty file at
`/mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid`.
This causes `ctr version` (and hence the health check) to fail. See
balena-os/balena-engine#322

This commit addresses this issue in two ways:

1. Before running `ctr version`, we check if the uuid file exists and is
   empty. If so, we remove it. (The subsequent execution of `ctr
   version` by the healthcheck will create the file again.)
2. After running `ctr version`, we check if the uuid file was really
   created and is not empty.

In both cases, when an empty uuid file is detected, we log the event to
help us confirm our hypothesis about the root cause.

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
@lmbarros
Copy link
Contributor Author

Saw a variation of this today, in two different devices. In both cases the uuid file was created, but with invalid contents. Interestingly, in both cases the contents of the file were the same:

# hexdump /mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000020 0000 0000                              
0000024

# sha1sum /mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid 
8696cf0f4655636cc93c566c1be2dad311da646c  /mnt/data/docker/containerd/daemon/io.containerd.grpc.v1.introspection/uuid

@alanb128
Copy link

Answers to questions posed about the devices referenced above, where the UUID file was created but with invalid contents:

Do you know when these two devices were fist provisioned?

According to Balena dashboard they were created Nov 2022 and Jan 2023. It should reflect real commission date. Many other devices from this fleet were commissioned in the same time interval.

From what I see in the logs, all the OS update attempts on them were made in the last 2 or 3 days. Can you confirm that, or there were update attempts (successful or not) from before that?

I cannot be 100% sure as we do not track them, but I would say that any OS updates attempts before that are very unlikely. But there probably were a supervisor update to 14.13.14 some time before mentioned OS update.

Did you ever notice anything wrong with these devices before? Do you think they could have been in this state (containers restarting every 6 minutes) for long without you noticing?

I noticed repeated supervisor restarts only just before running OS updates. I was hoping that OS update could fix it as well.

@lmbarros
Copy link
Contributor Author

lmbarros commented Apr 10, 2024

Same user reported a device with the same symptoms (same invalid contents in the uuid file) on device running balenaOS v3.1.1 and that hasn't gone though OS or Supervisor updates ever. And according to stat and our database, the uuid file was created when the first provisioned and never changed since then. This should rule out any relationship of the issue with HUPs.

@lmbarros lmbarros changed the title Empty .../containerd/daemon/io.containerd.grpc.v1.introspection/uuid file created occasionally Empty or invalid .../containerd/daemon/io.containerd.grpc.v1.introspection/uuid file created occasionally Apr 17, 2024
@lmbarros
Copy link
Contributor Author

Interestingly, in both cases the contents of the file were the same

Well, all zeros actually... somehow I missed that. Not so interesting then.

lmbarros added a commit to balena-os/meta-balena that referenced this issue Apr 17, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
@lmbarros
Copy link
Contributor Author

PR with work around for invalid uuid files: balena-os/meta-balena#3409 . Needs more testing.

lmbarros added a commit to balena-os/meta-balena that referenced this issue Apr 26, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
alexgg pushed a commit to balena-os/meta-balena that referenced this issue May 15, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue May 21, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue May 27, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
alexgg pushed a commit to balena-os/meta-balena that referenced this issue Jun 2, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Jun 4, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
lmbarros added a commit to balena-os/meta-balena that referenced this issue Jun 5, 2024
We have detected one more way in which the uuid file used by containerd
can get corrupted. This time, the file is not empty, but doesn't contain
a valid UUID either.

This commit thus extends the existing workaround to also handle this
case.

See balena-os/balena-engine#322

Signed-off-by: Leandro Motta Barros <[email protected]>
Change-type: patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants