fix issues with bandwidth plugin and htb introduction #1097

h0nIg · 2024-09-20T17:04:23Z

With the introduction of the htb changes through #921, 2 performance issues were introduced which we see in our production environments.

issue 1

the submitter of the htb changes did not knew about rate2quantum and subsequently about quantum issues which may arise with HTB (or any other qdisc like fq_codel).

https://github.com/containernetworking/plugins/blob/main/plugins/meta/bandwidth/ifb_creator.go#L163

The kernel is warning once unreasonable values are specified, because too high values cause subsequent issues (like txqueue drops once too much packets are dequeued).
https://github.com/torvalds/linux/blob/baeb9a7d8b60b021d907127509c44507539c15e5/net/sched/sch_htb.c#L2037
https://github.com/torvalds/linux/blob/baeb9a7d8b60b021d907127509c44507539c15e5/net/sched/sch_htb.c#L2052-L2055

issue 2

if the txqueuelen is unspecified, the kernel will decide about appropriate values. In our case the kernel will use reasonable defaults like txqueuelen=0 for this virtual device once the value is not specified. This means queueing is not to be done in case the receiving kernel code is busy and not picking up the packet fast enough. We are talking about kernel here and not the ring buffers nor /sys/class/net/eth0/queues/tx-*

Systems should drop early enough to ensure continuity and avoid working on full buffer. Bursting is OK, but HTB bursts beyond HW limits should still get limited, because this is causing latency for all containers running on the system for longer period of time. The "full buffer" can get caused by several reasons, one reason is the "unlimited HTB class" allowing unlimited rates / bursts for a single container which is causing unfairness on a system with containers running below their limited rates.

The following screenshot describes the old tbf setup which was compared to the htb setup (numtxqueues / numrxquees are the same for the tbf and the htb code and match the amount of vCPU).

Dequeue-ing from veth tx queues with 64 concurrent queues (queues equal to vCPU to concurrently execute kernel code) compared to a single txqueue without "concurrency". This was not relevant until the introduction of HTB which is dropping less (causing latency) and the introduction of txqueuelen=1000 causing latency as well.

https://gist.github.com/eqhmcow/939373#file-hfsc-shape-sh-L48-L50

It was intended to make txqueuelen equal between the veth interface, but numtxqueues / numrxqueues were not considered. Once you queue with different amount of queues between veth and ifb, it will cause kernel performance issues. Similar warnings exists in veth.c itself already, because the Host and container numtxqueues need to have at least same size counterparts:

https://github.com/torvalds/linux/blob/9d3684c24a5232c2d7ea8f8a3e60fe235e6a9867/drivers/net/veth.c#L1199-L1206

The usual veth creation may look like this, where numtxqueues of outer0 correlate to the numrxqueues of inner0, the same applies the other way around.

ip link add outer0 numrxqueues 16 numtxqueues 16 type veth peer inner0 numrxqueues 16 numtxqueues 16

The kernel is even suggesting to use numtxqueues for ifb devices, which is missing here: https://github.com/torvalds/linux/blob/e8fc317dfca9021f0ea9ed77061d8df677e47a9f/drivers/net/ifb.c#L396

The upstream netlink library does not offer setting numtxqueues / numrxqueues for ifb adapters yet (just supported for veth), therefore it is required to avoid setting qlen until the upstream library offers support and numtxqueues / numrxqueues match between veth and ifb.

h0nIg · 2024-09-20T21:45:50Z

ping @dcbw @MikeZappa87 @s1061123

you reviewed the causing PR #921, may i ask you to take a look into this please? any questions regarding the actual problem (despite that i still need to fix a test)?

s1061123 · 2024-09-30T15:47:08Z

@h0nIg , thank you for the PR. I just start the review the code and will be reply then.

s1061123 · 2024-10-01T06:22:33Z

Hi, @h0nIg . From the coding/fix point of view, it looks good to me. Thank you!

Please take care Github action error?

DCO

We need Signed-off-by field in commit logs. Please check following URL, modify commit message and push it again.
https://github.com/containernetworking/plugins/pull/1097/checks?check_run_id=30454599231

CI Test failure

CI integration test is failed. Could you please check that?
https://github.com/containernetworking/plugins/actions/runs/10966531920/job/30454704770?pr=1097

squeed · 2024-10-07T14:57:45Z

@h0nIg this is the last PR we'd like to see before we cut a release -- could you take a look?

h0nIg · 2024-10-09T08:09:24Z

@squeed i was on vacation and i will take a look into this within the next 1-3 days

h0nIg · 2024-10-10T07:52:19Z

honestly speaking: the whole HTB introduction is a mess (sorry, but that is reality).

The integration test reveals the same timing results, the introducer of the HTB change added debug output to make this visible. He knew that the "without qos" is not working (for whatever reason) and still asked for a merge of this.

2024/10/10 07:22:34 Runtime with qos limit 2.65 seconds
2024/10/10 07:22:34 Runtime without qos limit 2.65 seconds

I just changed it to "-1" which will end up with the linux defaults of "txqueuelen = 32". We have ran tests with this low value which did not manifest in measureable issues, but we know that any value > 0 might cause issues. I would first want to get this PR in and later maybe revert the whole HTB change.

h0nIg · 2024-10-10T07:53:19Z

@squeed ping, merge-able

squeed · 2024-10-10T09:07:17Z

@h0nIg should we just revert the whole change as-is? Is it worth trying to fix?

Thanks so much for your help on this; the bandwidth plugin is orphaned and nobody on the project has any expertise here.

h0nIg · 2024-10-11T10:29:45Z

issues are too severe, HTB requires a txqueue > 0 (indirectly through htb qdisc "direct_queue" and the like). Just set the txqueuelen == 0 and the tests are failing. Overall txqueue > 0 and numtxqueues / numrxqueues == 1 causes locking issues on interface level as mentioned above, therefore not worth to fix it and it is better to keep it simple.

full revert: #1105

Signed-off-by: h0nIg <[email protected]>

oOraph · 2024-10-11T13:28:03Z

Hi I just see this issue about a pr I made a while ago to add some option to exclude some subnets from traffic shapping in the bandwith plugin

First and foremost: sorry for any unmeasured inconvenience this could have caused. There seems to have been some side effect with the htb qdisc I did not know about. Then you were right reverting
Answering to this comment:
fix issues with bandwidth plugin and htb introduction #1097 (comment)

Mess -> your point of view, not a fact. the pr is small if we except the unrelated tests tidying/splitting, which was requested during review because the original file was becoming too big

I did not "know" the "without qos" test was not working:

the assertion is here to check it's ok:

plugins/integration/integration_linux_test.go

Line 199 in c29dc79

    
           Expect(runtimeWithLimit).To(BeNumerically(">", runtimeWithoutLimit+1000*time.Millisecond))

and as for the logs showing the same time it's a typo here:

plugins/integration/integration_linux_test.go

Line 197 in c29dc79

    
           log.Printf("Runtime without qos limit %.2f seconds", runtimeWithLimit.Seconds())

-> should have been

log.Printf("Runtime without qos limit %.2f seconds", runtimeWithoutLimit.Seconds())

we printed twice the same thing... so no need to blame people about bad intents.

Once corrected if you rerun it:

2024/10/11 13:08:16 Runtime with qos limit 2.65 seconds
2024/10/11 13:08:16 Runtime without qos limit 0.01 seconds

which is consistent with the assert test below the logs

cc @h0nIg

h0nIg · 2024-10-11T15:05:18Z

Hi I just see this issue about a pr I made a while ago to add some option to exclude some subnets from traffic shapping in the bandwith plugin

First and foremost: sorry for any unmeasured inconvenience this could have caused. There seems to have been some side effect with the htb qdisc I did not know about. Then you were right reverting

Answering to this comment:
fix issues with bandwidth plugin and htb introduction #1097 (comment)

Mess -> your point of view, not a fact. the pr is small if we except the unrelated tests tidying/splitting, which was requested during review because the original file was becoming too big

cc @h0nIg

not a problem, mistakes may happen and i didn't knew about such issues before as well.
it is not just limited to the integration test, i may need to summarize the problems: Missing quantum adjustments are causing issues, the kernel is warning the user already and there are comments in the code that the author does not understand them. There are better qdiscs than HTB which avoid latency issues, does not come with txqueue>0 locks (numtxqueues in combination with numrxqueues on the other end, see initial issue description) and qdisc locks because single qdisc needs to hold the state vs dedicated separated class per vCPU.

Overall i tried to get it fixed, but you can see that missing configurations in the upstream library, the need of txqueuelen>0 for HTB and further adjustments (queue per vCPU or similar) are required here. So it is better to go back to a working version instead which works without queueing

squeed · 2024-10-14T09:48:32Z

We have some other merged changes that people would like to see released. @oOraph; we will revert the htb changes for now, and you can fix them at your leisure.

h0nIg force-pushed the patch-1 branch 9 times, most recently from f936079 to c702643 Compare September 20, 2024 21:29

squeed mentioned this pull request Sep 26, 2024

Fix inadvertent txqueuelen being set to zero #1100

Merged

squeed requested a review from s1061123 September 30, 2024 15:06

h0nIg force-pushed the patch-1 branch 2 times, most recently from ffd2eef to be1a65d Compare October 10, 2024 07:31

h0nIg mentioned this pull request Oct 11, 2024

revert htb changes due to ongoing issues and missing maintainer #1105

Merged

h0nIg added 3 commits October 11, 2024 12:39

fix issues with quantum

cf33cbd

Signed-off-by: h0nIg <[email protected]>

revert txqueue due to kernel performance issues

874cd28

Signed-off-by: h0nIg <[email protected]>

fix tests

9975b5d

Signed-off-by: h0nIg <[email protected]>

h0nIg force-pushed the patch-1 branch from be1a65d to 9975b5d Compare October 11, 2024 10:39

h0nIg closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix issues with bandwidth plugin and htb introduction #1097

fix issues with bandwidth plugin and htb introduction #1097

h0nIg commented Sep 20, 2024 •

edited

Loading

h0nIg commented Sep 20, 2024

s1061123 commented Sep 30, 2024

s1061123 commented Oct 1, 2024

squeed commented Oct 7, 2024

h0nIg commented Oct 9, 2024

h0nIg commented Oct 10, 2024

h0nIg commented Oct 10, 2024

squeed commented Oct 10, 2024

h0nIg commented Oct 11, 2024 •

edited

Loading

oOraph commented Oct 11, 2024 •

edited

Loading

h0nIg commented Oct 11, 2024 •

edited

Loading

squeed commented Oct 14, 2024

fix issues with bandwidth plugin and htb introduction #1097

fix issues with bandwidth plugin and htb introduction #1097

Conversation

h0nIg commented Sep 20, 2024 • edited Loading

issue 1

issue 2

h0nIg commented Sep 20, 2024

s1061123 commented Sep 30, 2024

s1061123 commented Oct 1, 2024

DCO

CI Test failure

squeed commented Oct 7, 2024

h0nIg commented Oct 9, 2024

h0nIg commented Oct 10, 2024

h0nIg commented Oct 10, 2024

squeed commented Oct 10, 2024

h0nIg commented Oct 11, 2024 • edited Loading

oOraph commented Oct 11, 2024 • edited Loading

h0nIg commented Oct 11, 2024 • edited Loading

squeed commented Oct 14, 2024

h0nIg commented Sep 20, 2024 •

edited

Loading

h0nIg commented Oct 11, 2024 •

edited

Loading

oOraph commented Oct 11, 2024 •

edited

Loading

h0nIg commented Oct 11, 2024 •

edited

Loading