Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

ipset v7.2: Set cannot be destroyed: it is in use by a kernel component - proposed solution #3847

Closed
KevDBG opened this issue Aug 20, 2020 · 6 comments
Milestone

Comments

@KevDBG
Copy link
Contributor

KevDBG commented Aug 20, 2020

What you expected to happen?

weaveworks/weave-kube:2.6.4 start

What happened?

Related to the issue #3828, i have the same problem in other context.
We using rancher rke 2.4.5 and k8s K8S cluster v1.18-6-rancher1-1. When rancher try to start weaveworks/weave-kube:2.6.4, is stuck with the log "ipset v7.2: Set cannot be destroyed: it is in use by a kernel component".

How to reproduce it?

Indeed, i reproduce the problem directly in the container, i tried to launch all these commands in one block (at the same time):

/ #     iptables -w -F WEAVE-KUBE-TEST
/ #     iptables -w -X WEAVE-KUBE-TEST
/ #     ipset destroy weave-kube-test
/ #     iptables -w -F WEAVE-KUBE-TEST 2>/dev/null || true
/ #     iptables -w -X WEAVE-KUBE-TEST 2>/dev/null || true
/ #     ipset destroy weave-kube-test 2>/dev/null || true
/ # 
/ #     ipset create weave-kube-test hash:ip
/ #     iptables -w -t filter -N WEAVE-KUBE-TEST
/ #     if ! iptables -w -A WEAVE-KUBE-TEST -m set --match-set weave-kube-test src -j DROP; then
>         NOT_EXIST=1
>     fi
/ #     iptables -w -F WEAVE-KUBE-TEST
/ #     iptables -w -X WEAVE-KUBE-TEST
/ #     ipset destroy weave-kube-test
ipset v7.2: Set cannot be destroyed: it is in use by a kernel component

In this case we have the related issue (even if we use the -w). I tried an other thing related to this comment -> #3816 (comment)

/ #     iptables -w -F WEAVE-KUBE-TEST
/ #     iptables -w -X WEAVE-KUBE-TEST
/ #     ipset destroy weave-kube-test
/ #     iptables -w -F WEAVE-KUBE-TEST 2>/dev/null || true
/ #     iptables -w -X WEAVE-KUBE-TEST 2>/dev/null || true
/ #     ipset destroy weave-kube-test 2>/dev/null || true
/ # 
/ #     ipset create weave-kube-test hash:ip
/ #     iptables -w -t filter -N WEAVE-KUBE-TEST
/ #     if ! iptables -w -A WEAVE-KUBE-TEST -m set --match-set weave-kube-test src -j DROP; then
>         NOT_EXIST=1
>     fi
/ #     iptables -w -F WEAVE-KUBE-TEST
/ #     iptables -w -X WEAVE-KUBE-TEST
/ #     sleep 1
/ #     ipset destroy weave-kube-test

In this case, it's works. I think the ipset is launch to quickly after iptables command and the iptables related to the ipset list is still present.

I will try to push a pull request with this kind of modification.

Anything else we need to know?

in my case the host is a centos8, so the script use iptables-nft (not legacy iptables)

Versions:

rancher rke 2.4.5 and k8s K8S cluster v1.18-6-rancher1-1
weaveworks/weave-kube:2.6.4

Logs:

ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
@NeonSludge
Copy link
Contributor

This, apparently, goes deeper than just the startup script:

  1. Spin up a fresh k8s cluster with kubeadm on CentOS 8.
  2. Install Weave Net (nftables mode).
  3. Start creating and deleting a namespace in a loop and watch logs from weave-npc instances.
  4. The same error sometimes occurs when weave-npc deletes default rules for the namespace that is being deleted:
DEBU: 2020/08/21 09:38:54.700858 removing rule for DefaultAllow in namespace: test2, chain: WEAVE-NPC-DEFAULT, [-m set --match-set weave-laMQDEB2[OfX;:}||_kdvSAYC dst -j ACCEPT -m comment --comment DefaultAllow ingress isolation for namespace: test2]
DEBU: 2020/08/21 09:38:54.704545 removing rule for DefaultAllow in namespace: test2, chain: WEAVE-NPC-EGRESS-DEFAULT, [-m set --match-set weave-0Y^EJ#}Y/U*xi5O28JnRXgs1z src -j WEAVE-NPC-EGRESS-ACCEPT -m comment --comment DefaultAllow egress isolation for namespace: test2]
DEBU: 2020/08/21 09:38:54.708591 removing rule for DefaultAllow in namespace: test2, chain: WEAVE-NPC-EGRESS-DEFAULT, [-m set --match-set weave-0Y^EJ#}Y/U*xi5O28JnRXgs1z src -j RETURN -m comment --comment DefaultAllow egress isolation for namespace: test2]
ERRO: 2020/08/21 09:38:54.721239 delete namespace: ipset [destroy weave-0Y^EJ#}Y/U*xi5O28JnRXgs1z] failed: ipset v7.2: Set cannot be destroyed: it is in use by a kernel component: exit status 1

This is not a bug in Weave itself because you can reproduce this on a fresh CentOS 8 machine with just a couple of iptables/ipset commands. The weird thing is that sometimes it works on a machine and doesn't work on an identical machine next to it. So this has nothing to do with kernel/netfilter/nftables versions. I'm currently trying to investigate this further. Right now it seems that either there is a race condition somewhere that gets triggered in some very specific circumstances or nftables rule manipulation is sometimes asynchronous in nature.

@NeonSludge
Copy link
Contributor

An update:

  1. This reproduces on Ubuntu 20.04 as well: the same error occurs if you try destroying an ipset right after deleting a rule that references it.
  2. This issue only reproduces on machines with more than 1 online CPUs. Bringing CPUs offline fixes the problem, reenabling them breaks ipset destruction again.

@bboreham
Copy link
Contributor

Can you try with Weave Net 2.7.0 ? There were a few changes to iptables, though nothing I could pin specifically to your symptoms.

@KevDBG
Copy link
Contributor Author

KevDBG commented Aug 26, 2020

Hello,

Thanks @NeonSludge for your tests and feedback.

Indeed, the problem is not linked to weave, because if i try the same block of command in one time on the host (not in weave pod), i have the same problem.

I tried on a Centos8 fresh install with 2 vCPU. The problem is aleatory because sometimes it's works and sometimes i have the message failed: ipset v7.2: Set cannot be destroyed: it is in use by a kernel component.

So, i think the "iptables flush" command is handle but not finished when "ipset destroy" is executed (there are concurrently). Maybe, is due to threading (ex: iptables on vCPU 1 and ipset and vCPU 2).

The sleep 1, it's a workaround to handle vCPU cycle effect and multithreading. To find a perfect solution, it's necessary to make some deep tests.

@bboreham We have the same problem with Weave Net 2.7.0 for the reason explain above. The sleep 1 is not perfect but at this time it's works for me :-) (but i have to tweat my weave deployment in rancher rke with a sed command, it's not really good ...).

bboreham added a commit that referenced this issue Jan 18, 2021
Otherwise the operation can sometimes fail - see #3847
bboreham pushed a commit that referenced this issue Jan 18, 2021
Sleep before ipset destroy in startup, otherwise the operation can
sometimes fail - see #3847
@bboreham bboreham added this to the 2.8.0 milestone Jan 18, 2021
@bboreham
Copy link
Contributor

Closed by #3882

@lnedry
Copy link

lnedry commented Jan 12, 2024

Instead of "sleep 1" use the "wait" command. The script will pause until the last operation, in this case "flush", has completed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants