-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace CTLB workaround with a config option #8139
Conversation
// BPFConnectTimeLoadBalancing when in BPF mode, controls whether Felix installs the connect-time load | ||
// balancer. The connect-time load balancer is required for the host to be able to reach Kubernetes services | ||
// and it improves the performance of pod-to-service connections.When set to TCP, connect time load balancing | ||
// is available only for services with TCP ports. [Default: Enabled] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo our default should be TCP
// is available only for services with TCP ports. [Default: Enabled] | ||
BPFConnectTimeLoadBalancing *BPFConnectTimeLBType `json:"bpfConnectTimeLoadBalancing,omitempty" validate:"omitempty,oneof=TCP Enabled Disabled"` | ||
// BPFHostNetworkedNATWithoutCTLB when in BPF mode, controls whether Felix does a NAT without CTLB. This along with BPFConnectTimeLoadBalancing | ||
// determines the CTLB behavior. [Default: Disabled] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be enabled
The end result is that DNS which is UDP does not break when backend dies, while TCP does not pay the perf hit.
felix/fv/infrastructure/topology.go
Outdated
@@ -289,6 +289,8 @@ func StartNNodeTopology(n int, opts TopologyOptions, infra DatastoreInfra) (tc T | |||
// host. So, disable CTLB handling for subsequent Felixes. | |||
if i > 0 { | |||
optsPerFelix[i].ExtraEnvVars["FELIX_BPFConnectTimeLoadBalancingEnabled"] = "false" | |||
optsPerFelix[i].ExtraEnvVars["FELIX_BPFConnectTimeLoadBalancing"] = string(api.BPFConnectTimeLBDisabled) | |||
optsPerFelix[i].ExtraEnvVars["FELIX_BPFHostNetworkedNATWithoutCTLB"] = string(api.BPFHostNetworkedNATEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be also disabled. We do not want to install the workaround. CTLB is installed test-wide. We do not want to install it multiple times. That is why the option FELIX_BPFConnectTimeLoadBalancing=false
is overriden above. But this one should be kept as set by the test.
@@ -113,4 +113,12 @@ func setDefaults(fc *apiv3.FelixConfiguration) { | |||
disabled := apiv3.FloatingIPsDisabled | |||
fc.Spec.FloatingIPs = &disabled | |||
} | |||
if fc.Spec.BPFConnectTimeLoadBalancing == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the place where we also need to decide if the legacy BPFConnectTimeLoadBalancingEnabled
is set, right? If it is is and none of the new ones are set, just set BPFConnectTimeLoadBalancing=true
and BPFHostNetworkedNATWithoutCTLB=disabled
and then you do not need to worry about BPFConnectTimeLoadBalancingEnabled
in Felix. If the new one and the old one are set then we need to either raise a warning or fail validation or something, right? Once we deprecate the legacy option we would remove the code here.
if config.BPFConnTimeLB == string(apiv3.BPFConnectTimeLBDisabled) && | ||
config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATDisabled) { | ||
log.Warn("Access to services from host networked process wont work, forcing hostnetworked NAT to Enabled") | ||
config.BPFHostNetworkedNAT = string(apiv3.BPFHostNetworkedNATEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to just give a warning instead of forcing it?
// The above cases are invalid configuration. Revert to CTLB enabled. | ||
if config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATEnabled) { | ||
if config.BPFConnTimeLB == string(apiv3.BPFConnectTimeLBEnabled) { | ||
log.Warn("Access to services may not work properly, reverting to default CTLB configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.BPFHostNetworkedNAT == string(apiv3.BPFHostNetworkedNATEnabled
doesnot make sense, but also would not do much harm. I would revert here to Disabled instead of flipping the other to TCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think flipping the ConnTimeLB
to TCP
keeps it line with the defaults.
Description
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.