-
Notifications
You must be signed in to change notification settings - Fork 39.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean up kube-proxy stale-conntrack-entry handling, revert broken code #115299
clean up kube-proxy stale-conntrack-entry handling, revert broken code #115299
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
} | ||
detectLocal, _ := proxyutiliptables.NewDetectLocalByCIDR("10.0.0.0/8", ipt) | ||
detectLocal, _ := proxyutiliptables.NewDetectLocalByCIDR(podCIDR, ipt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
/lgtm or how we could avoid hours/days of discussion if just someone have tested it 🙃 I don't know why I assumed this was running in production somewhere |
LGTM label has been added. Git tree hash: 9190d8a7036fe56cc803748063e76ae49ebab7fc
|
/assign @thockin you need approval for conntrack utils |
bf1d500
to
265a3e6
Compare
(rebased for FakeExec vs *FakeExec changes in unit tests) |
/lgtm |
LGTM label has been added. Git tree hash: 0d91e7fb7b84ff652f56394b087a2984138f3ae0
|
265a3e6
to
72bc080
Compare
} | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library is only used by the proxies ,
pkg/proxy/iptables/proxier.go: err = conntrack.ClearEntriesForPortNAT(proxier.exec, endpointIP, nodePort, svcProto)
pkg/proxy/iptables/proxier.go: err = conntrack.ClearEntriesForNAT(proxier.exec, svcInfo.ClusterIP().String(), endpointIP, svcProto)
pkg/proxy/iptables/proxier.go: err := conntrack.ClearEntriesForNAT(proxier.exec, extIP, endpointIP, svcProto)
pkg/proxy/iptables/proxier.go: err := conntrack.ClearEntriesForNAT(proxier.exec, lbIP, endpointIP, svcProto)
pkg/proxy/iptables/proxier.go: if err := conntrack.ClearEntriesForIP(proxier.exec, svcIP, v1.ProtocolUDP); err != nil {
pkg/proxy/iptables/proxier.go: err := conntrack.ClearEntriesForPort(proxier.exec, nodePort, isIPv6, v1.ProtocolUDP)
pkg/proxy/ipvs/proxier.go: if err := conntrack.ClearEntriesForIP(proxier.exec, svcIP, v1.ProtocolUDP); err != nil {
pkg/proxy/ipvs/proxier.go: err := conntrack.ClearEntriesForPort(proxier.exec, nodePort, proxier.ipFamily == v1.IPv6Protocol, v1.ProtocolUDP)
pkg/proxy/ipvs/proxier.go: err = conntrack.ClearEntriesForPortNAT(proxier.exec, endpointIP, nodePort, svcProto)
pkg/proxy/ipvs/proxier.go: err = conntrack.ClearEntriesForNAT(proxier.exec, svcInfo.ClusterIP().String(), endpointIP, svcProto)
pkg/proxy/ipvs/proxier.go: err := conntrack.ClearEntriesForNAT(proxier.exec, extIP, endpointIP, svcProto)
pkg/proxy/ipvs/proxier.go:
I think that previously it was used by kubelet hostport, but now it makes sense to me that we move this to pkg/proxy/util
... and we also have it under the sig-network approvers umbrella
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes... the person who was working on the iptables/ipvs conntrack sync was going to do that I think. Do you want me to do that here? (We could also just fix the OWNERS file in the existing dir...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, fixing the owners thing is the point, he is waiting on this PR.
I think is better if you can do it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
72bc080
to
7e9e0ad
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -265,46 +235,42 @@ func TestDeleteConnections(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestClearConntrackForPortNAT(t *testing.T) { | |||
func TestClearUDPConntrackForPortNAT(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change, we remove one test case of the existing 2, but you append new outputs as it there are 3 test cases, am I looking it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fakeexec is just terrible... I tried to fix it once (kubernetes/utils#14)
anyway, as it was, the PR correctly reverted the test back to how it was before 70020, but yes, it had the wrong number of entries before; the test case code checks that all of the expected outputs are correct, but it doesn't check that there are no extra expected outputs. Presumably someone copy+pasted this from one of the other test cases where there were more expected commands and forgot to delete the extras.
anyway, repushed with a fix
@@ -177,29 +177,21 @@ func TestClearUDPConntrackForPort(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestDeleteConnections(t *testing.T) { | |||
func TestDeleteUDPConnections(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok, we remove 3 test cases and we remove the 3 fake calls, TestClearUDPConntrackForPortNAT
seems something went missing on the revert, no?
return netIP.To4() == nil | ||
// Check the executed conntrack command | ||
if tc.protocol == UDP { | ||
if fexec.CommandCalls != 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, is explained below,
This commit did not actually work; in between when it was first written and tested, and when it merged, the code in pkg/proxy/endpoints.go was changed to only add UDP endpoints to the "stale endpoints"/"stale services" lists, and so checking for "either UDP or SCTP" rather than just UDP when processing those lists had no effect. This reverts most of commit aa8521d (but leaves the changes related to ipvs.IsRsGracefulTerminationNeeded() since that actually did have the effect it meant to have).
The APIs talked about "stale services" and "stale endpoints", but the thing that is actually "stale" is the conntrack entries, not the services/endpoints. Fix the names to indicate what they actual keep track of. Also, all three fields (2 in the endpoints update object and 1 in the service update object) are currently UDP-specific, but only the service one made that clear. Fix that too.
Rather than calling fp.deleteEndpointConnection() directly, set up the proxy to have syncProxyRules() call it, so that we are testing it in the way that it actually gets called. Squash the IPv4 and IPv6 unit tests together so we don't need to duplicate all that code. Fix a tiny bug in NewFakeProxier() found while doing this...
Now that the endpoint update fields have names that make it clear that they only contain UDP objects, it's obvious that the "protocol == UDP" checks in the iptables and ipvs proxiers were no-ops, so remove them.
7e9e0ad
to
7696bcd
Compare
/lgtm |
LGTM label has been added. Git tree hash: 3f99f3f8cd2ab08d00206bce260d57722adc61a0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
// and then goes unready or changes its IP address. | ||
// detectStaleConntrackEntries detects services that may be associated with stale conntrack entries. | ||
// (See UpdateEndpointMapResult.DeletedUDPEndpoints and .NewlyActiveUDPServices.) | ||
func detectStaleConntrackEntries(oldEndpointsMap, newEndpointsMap EndpointsMap, deletedUDPEndpoints *[]ServiceEndpoint, newlyActiveUDPServices *[]ServicePortName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as a continuation of what we've always done, but the REAL fix would be to load conntrack info and scan that for tuples that don't match and need to be discarded. It works as is, but if anything goes wrong, we lose the "event" and never come back to reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as a continuation of what we've always done, but the REAL fix would be to load conntrack info and scan that for tuples that don't match and need to be discarded.
do you mean to do a diff between the conntrack table and the "virtual generated by kube-proxy expected conntrack table" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. Like "run conntrack -L and look for UDP records which correspond to a service endpoint, but that endpoint doesn't exist in that service" as opposed to a removal "event" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be interesting if someone can do the exploratory work to see how expensive is to list all the conntrack table ... some time ago I asked one kernel/netfilter developer to implement server side filtering so you can dump only some specific entries, he told me this will solve the problem http://patchwork.ozlabs.org/project/netfilter-devel/patch/[email protected]/ but I never had time to follow up
What type of PR is this?
/kind cleanup
/sig network
What this PR does / why we need it:
As discussed in #108523 (comment), the "delete stale conntrack entries for UDP and SCTP connections" code in the iptables and ipvs proxiers actually only deletes stale conntrack entries for UDP connections; #74073 was mostly a no-op at the time it was merged, because the fix it originally contained got sniped by another change elsewhere in the code that silently broke it.
While we could re-fix things to work the way they were supposed to work post-#74073, I don't think that is the right answer at this point; no one has complained about the current behavior being broken in a while, and we had a long argument about the correct behavior in #108523 without any of the participants ever noticing that we were claiming kube-proxy did something that it does not do. It seems that either (a) people are happy with kube-proxy's current behavior, or (b) nobody is actually using SCTP Services in kubernetes any more.
Either way, given that we don't seem totally sure what the right behavior is, this PR doesn't change the current behavior. It just reverts the non-functional parts of #74073 and renames some struct fields and rewrites comments so as to make it clearer what the behavior is (and always has been).
I considered just changing
conntrack.IsClearConntrackNeeded
to only returntrue
for UDP, and leaving the rest of the original PR in place (so as to put us in a better place for eventually "re"-enabling SCTP conntrack clearing). However, I suspect that we don't want a single "protocol needs conntrack cleanup" predicate anyway; we may want to do different protocols in different cases. (UDPStaleClusterIP
DeletedUDPClusterIPs
was always UDP-specific, before and after #74073, and I thinkStaleServiceNames
NewlyActiveUDPServices
probably ought to be UDP-specific as well.)The last commit removes some "protocol == UDP" checks that are now "obviously unnecessary", but it broke the
deleteEndpointConnections
unit tests because they were passing data to that function that would now be considered wrong. Hence "Make deleteEndpointConnection test use syncProxyRules
", which makes it so that the test results are unchanged, by lettingsyncProxyRules
decide whether to actually calldeleteEndpointConnection
or not. This is kind of a big ugly change and makes the "unit test" more of an integration test kind of, so maybe it would be better instead to just remove the TCP/SCTP tests from that function? (Though as noted in #108523, if the test had been written this way in the first place then it would have caught the fact that #74073 was broken.)Which issue(s) this PR fixes:
none, but related to #108523
Does this PR introduce a user-facing change?
/cc @uablrek @tssurya @thockin @aojea