Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The driver reports incorrect number of allocatables count #1808

Closed
zerkms opened this issue Oct 25, 2023 · 10 comments · Fixed by #1843
Closed

The driver reports incorrect number of allocatables count #1808

zerkms opened this issue Oct 25, 2023 · 10 comments · Fixed by #1843
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@zerkms
Copy link
Contributor

zerkms commented Oct 25, 2023

/kind bug

What happened?

I have a 1.26.4 cluster that runs 4 worker nodes each of which is t3a.xlarge. All 4 joined the cluster the same day (within couple hours). All have 2 network interfaces attached.

Yet all 4 report different number of Allocatables.Count: 9, 10, 16, 26.

What you expected to happen?

I expect all of them to have it 26.

How to reproduce it (as minimally and precisely as possible)?

I don't know 🤷

Anything else we need to know?:

Environment: aws

  • Kubernetes version (use kubectl version): 1.26.4
  • Driver version: 1.24.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2023
@zerkms
Copy link
Contributor Author

zerkms commented Oct 26, 2023

Further analysis: the driver uses GetNumBlockDeviceMappings via metadata service.

And metadata service reports the number of mapped devices wrong!!

On the machine:

$ curl -s http://169.254.169.254/latest/meta-data/block-device-mapping/|grep ebs| wc -l
16

Yet, aws cli:

$ aws ec2 describe-instance-attribute --instance-id xxx --attribute=blockDeviceMapping | jq '.BlockDeviceMappings | length'
5

(of which 4 are ebs + 1 for the root block devices)

And the aws UI console shows the same 4+1 attached block devices.

@zerkms
Copy link
Contributor Author

zerkms commented Oct 27, 2023

After vm full shutdown (via console), it finally reports the matching numbers. Yet, csinode still shows 10 as capacity.

Is that value only determined on node register and then not refreshed? If so - how would one update it?

@ConnorJC3
Copy link
Contributor

And metadata service reports the number of mapped devices wrong!!

I'd recommend reporting this to AWS support (especially if it's reproducible), they should be able to get this to the appropriate team that works on IMDS. Unfortunately we don't have a good way to know if/when the metadata service is wrong so we trust its output.

Is that value only determined on node register and then not refreshed? If so - how would one update it?

Generally, restarting the EBS CSI Driver pod should cause Kubernetes to re-query the limit and update it. See if that fixes it - usually rebooting the node would be equivalent, my best guess is that there might have been a short period of time during the node's startup where the wrong value was still being displayed.

@zerkms
Copy link
Contributor Author

zerkms commented Oct 30, 2023

Generally, restarting the EBS CSI Driver pod should cause Kubernetes to re-query the limit and update it. See if that fixes it - usually rebooting the node would be equivalent, my best guess is that there might have been a short period of time during the node's startup where the wrong value was still being displayed.

Entire node shutdown helped to make http metadata api to report correct numbers, but reported allocatable numbers still make no sense to me:

$ kubectl get csinode -o custom-columns=NODE:.metadata.name,ALLOCATABLE:'.spec.drivers[0].allocatable.count'|grep node
node-A 9    (11 actually attached, inlcuding root)
node-B 26   (12 actually attached, inlcuding root)
node-C 16   (7 actually attached, inlcuding root)
node-D 15   (18 actually attached, inlcuding root)

Text in parentheses is added by me manually, node name is redacted, the number reported is untouched.

All 4 nodes are identical t3a.xlarge, run in the same AZ, joined within 30 minutes (originally I said 2 hours, but I checked more accurately after that - they are much closer than that). Each node has 2 network interfaces (default, plus one extra attached).

Values between ebs driver pod restarts don't change, even if I drain the node, ensure no extra volumes attached, then restart ebs - it still reports the same number as above.

To me numbers look random and the only explanation from reading code I have - it's cached somewhere?

@jsafrane
Copy link
Contributor

I'm hitting the same issue. When a node is shut down without draining it first (and thus detaching all its volumes) and started later, then the node gets wrong attach limit in CSINode spec.drivers[].allocatable. I've identified two separate issues:

  1. AWS metadata service keeps reporting volumes that were attached at the node startup in http://169.254.169.254/latest/meta-data/block-device-mapping/, despite the volumes were detached later.

  2. Even if AWS metadata service was correct, kubelet asks the CSI driver for its limits only at the driver startup. If a volume was detached later and the medata was updated, the CSI driver has no way to tell kubelet that the attach limit is higher.

Therefore I think it's not really useful to read block-device-mapping at all - it may be either totally wrong because of 1., or it may include Kubernetes volumes that will be detached shortly because of 2.

Would be better just to pick nr. of attachments from instance type (GetMaxAttachments / GetDedicatedLimitForInstanceType) + reserve 1 attachment for the root disk ?

@jsafrane
Copy link
Contributor

It's quite hard and error prone to recover from this situation. Cluster admin must drain + shut down a node that has misleading capacity count and start it again. It's not enough to restart the CSI driver.

@jsafrane
Copy link
Contributor

jsafrane commented Feb 1, 2024

/reopen

Since #1843 was reverted.

@k8s-ci-robot k8s-ci-robot reopened this Feb 1, 2024
@k8s-ci-robot
Copy link
Contributor

@jsafrane: Reopened this issue.

In response to this:

/reopen

Since #1843 was reverted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@torredil
Copy link
Member

torredil commented Feb 9, 2024

Fixed by @jsafrane's contribution in #1919
/close

@k8s-ci-robot
Copy link
Contributor

@torredil: Closing this issue.

In response to this:

Fixed by @jsafrane's contribution in #1919
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
5 participants