Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Faults" view should show all Terminating pods #2738

Closed
akatch opened this issue Jun 13, 2024 · 1 comment · Fixed by #2935
Closed

"Faults" view should show all Terminating pods #2738

akatch opened this issue Jun 13, 2024 · 1 comment · Fixed by #2935

Comments

@akatch
Copy link

akatch commented Jun 13, 2024




Describe the bug
Enabling the "Toggle Faults" view shows some Terminating pods, but not all. Enabling this view should display all Terminating pods (and indeed all pods not in a Running and Ready state). However, it is unclear why some pods show up as Terminating in this view, but others do not. I did some brief digging in the code and it is not entirely clear how k9s determines which pods are considered faulty - it's possible that some Terminating pods meet these criteria but not all.

Further investigation shows that some Terminating pods with Events such as Node Not Ready (which I would absolutely 100% expect to show up in Faults) do not show up in the Fault view. This is the case in the attached screenshots below.

To Reproduce
Steps to reproduce the behavior:

  1. View all pods for a namespace :pods [namespace] where many pods are Terminating
  2. Enable Faults view ctrl+z by default
  3. All pods not in a Running/Ready state should appear in this view, but not all do, in particular not all Terminating pods show up.

Expected behavior
All pods not in a Running/Ready state should appear when Faults view is enabled.

Screenshots
I have had to heavily sanitize these but hopefully they help demonstrate the issue.

A view of all pods, in particular many that are Terminating
A view of all pods, in particular many that are Terminating

The same namespace captured moments later in Fault view. No Terminating pods are seen.
The same namespace captured moments later in Fault view

Versions

  • OS: macOS 14.5 (Sonoma)
  • K9s: 0.32.4
  • K8s: 1.20.15, 1.24.13
gomesdigital added a commit to gomesdigital/k9s that referenced this issue Oct 27, 2024
@gomesdigital
Copy link
Contributor

The reason is because a pod in Terminating does not necessarily mean its containers are not ready, which is what k9s is using to classify faulty pods.

You can see that here:

k9s/internal/render/pod.go

Lines 168 to 177 in be1ec87

func (p Pod) diagnose(phase string, cr, ct int) error {
if phase == Completed {
return nil
}
if cr != ct || ct == 0 {
return fmt.Errorf("container ready check failed: %d of %d", cr, ct)
}
return nil
}

When an error is returned, this is propagated to the filterToast() func:

func (t *TableData) filterToast() *RowEvents {

In my experience we had an EC2 fail causing services to go out - pod phases were Terminating, but not showing in the Faults view because the container statuses were still Ready. So I agree with @akatch. Majority of the time the container-ready/container-total metric works, but there are cases where it doesn't apply. In addition, if we could just see all terminating pods it would help reveal pods that are stuck in terminating. We've dealt with that problem extensively and had to manually search for them because they are filtered out in the faults view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants