The driver reports incorrect number of allocatables count #1808

zerkms · 2023-10-25T22:30:29Z

/kind bug

What happened?

I have a 1.26.4 cluster that runs 4 worker nodes each of which is t3a.xlarge. All 4 joined the cluster the same day (within couple hours). All have 2 network interfaces attached.

Yet all 4 report different number of Allocatables.Count: 9, 10, 16, 26.

What you expected to happen?

I expect all of them to have it 26.

How to reproduce it (as minimally and precisely as possible)?

I don't know 🤷

Anything else we need to know?:

Environment: aws

Kubernetes version (use kubectl version): 1.26.4
Driver version: 1.24.0

The text was updated successfully, but these errors were encountered:

zerkms · 2023-10-26T23:49:54Z

Further analysis: the driver uses GetNumBlockDeviceMappings via metadata service.

And metadata service reports the number of mapped devices wrong!!

On the machine:

$ curl -s http://169.254.169.254/latest/meta-data/block-device-mapping/|grep ebs| wc -l
16

Yet, aws cli:

$ aws ec2 describe-instance-attribute --instance-id xxx --attribute=blockDeviceMapping | jq '.BlockDeviceMappings | length'
5

(of which 4 are ebs + 1 for the root block devices)

And the aws UI console shows the same 4+1 attached block devices.

zerkms · 2023-10-27T00:37:02Z

After vm full shutdown (via console), it finally reports the matching numbers. Yet, csinode still shows 10 as capacity.

Is that value only determined on node register and then not refreshed? If so - how would one update it?

ConnorJC3 · 2023-10-30T14:22:35Z

And metadata service reports the number of mapped devices wrong!!

I'd recommend reporting this to AWS support (especially if it's reproducible), they should be able to get this to the appropriate team that works on IMDS. Unfortunately we don't have a good way to know if/when the metadata service is wrong so we trust its output.

Is that value only determined on node register and then not refreshed? If so - how would one update it?

Generally, restarting the EBS CSI Driver pod should cause Kubernetes to re-query the limit and update it. See if that fixes it - usually rebooting the node would be equivalent, my best guess is that there might have been a short period of time during the node's startup where the wrong value was still being displayed.

zerkms · 2023-10-30T20:18:59Z

Generally, restarting the EBS CSI Driver pod should cause Kubernetes to re-query the limit and update it. See if that fixes it - usually rebooting the node would be equivalent, my best guess is that there might have been a short period of time during the node's startup where the wrong value was still being displayed.

Entire node shutdown helped to make http metadata api to report correct numbers, but reported allocatable numbers still make no sense to me:

$ kubectl get csinode -o custom-columns=NODE:.metadata.name,ALLOCATABLE:'.spec.drivers[0].allocatable.count'|grep node
node-A 9    (11 actually attached, inlcuding root)
node-B 26   (12 actually attached, inlcuding root)
node-C 16   (7 actually attached, inlcuding root)
node-D 15   (18 actually attached, inlcuding root)

Text in parentheses is added by me manually, node name is redacted, the number reported is untouched.

All 4 nodes are identical t3a.xlarge, run in the same AZ, joined within 30 minutes (originally I said 2 hours, but I checked more accurately after that - they are much closer than that). Each node has 2 network interfaces (default, plus one extra attached).

Values between ebs driver pod restarts don't change, even if I drain the node, ensure no extra volumes attached, then restart ebs - it still reports the same number as above.

To me numbers look random and the only explanation from reading code I have - it's cached somewhere?

jsafrane · 2023-11-21T16:25:05Z

I'm hitting the same issue. When a node is shut down without draining it first (and thus detaching all its volumes) and started later, then the node gets wrong attach limit in CSINode spec.drivers[].allocatable. I've identified two separate issues:

AWS metadata service keeps reporting volumes that were attached at the node startup in http://169.254.169.254/latest/meta-data/block-device-mapping/, despite the volumes were detached later.
Even if AWS metadata service was correct, kubelet asks the CSI driver for its limits only at the driver startup. If a volume was detached later and the medata was updated, the CSI driver has no way to tell kubelet that the attach limit is higher.

Therefore I think it's not really useful to read block-device-mapping at all - it may be either totally wrong because of 1., or it may include Kubernetes volumes that will be detached shortly because of 2.

Would be better just to pick nr. of attachments from instance type (GetMaxAttachments / GetDedicatedLimitForInstanceType) + reserve 1 attachment for the root disk ?

jsafrane · 2023-11-21T17:19:24Z

It's quite hard and error prone to recover from this situation. Cluster admin must drain + shut down a node that has misleading capacity count and start it again. It's not enough to restart the CSI driver.

jsafrane · 2024-02-01T13:38:19Z

/reopen

Since #1843 was reverted.

k8s-ci-robot · 2024-02-01T13:38:24Z

@jsafrane: Reopened this issue.

In response to this:

/reopen

Since #1843 was reverted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

torredil · 2024-02-09T20:38:11Z

Fixed by @jsafrane's contribution in #1919
/close

k8s-ci-robot · 2024-02-09T20:38:15Z

@torredil: Closing this issue.

In response to this:

Fixed by @jsafrane's contribution in #1919
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2023

jsafrane mentioned this issue Nov 21, 2023

Remove block-device-mapping from attach limit #1843

Merged

k8s-ci-robot closed this as completed in #1843 Nov 29, 2023

k8s-ci-robot reopened this Feb 1, 2024

jsafrane mentioned this issue Feb 1, 2024

Add option to disable discovery of attached volumes from instance metadata #1917

Closed

k8s-ci-robot closed this as completed Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The driver reports incorrect number of allocatables count #1808

The driver reports incorrect number of allocatables count #1808

zerkms commented Oct 25, 2023

zerkms commented Oct 26, 2023

zerkms commented Oct 27, 2023

ConnorJC3 commented Oct 30, 2023

zerkms commented Oct 30, 2023 •

edited

Loading

jsafrane commented Nov 21, 2023

jsafrane commented Nov 21, 2023

jsafrane commented Feb 1, 2024

k8s-ci-robot commented Feb 1, 2024

torredil commented Feb 9, 2024

k8s-ci-robot commented Feb 9, 2024

The driver reports incorrect number of allocatables count #1808

The driver reports incorrect number of allocatables count #1808

Comments

zerkms commented Oct 25, 2023

zerkms commented Oct 26, 2023

zerkms commented Oct 27, 2023

ConnorJC3 commented Oct 30, 2023

zerkms commented Oct 30, 2023 • edited Loading

jsafrane commented Nov 21, 2023

jsafrane commented Nov 21, 2023

jsafrane commented Feb 1, 2024

k8s-ci-robot commented Feb 1, 2024

torredil commented Feb 9, 2024

k8s-ci-robot commented Feb 9, 2024

zerkms commented Oct 30, 2023 •

edited

Loading