Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firewall CR migration #1626

Merged
merged 2 commits into from
Apr 17, 2023
Merged

Conversation

sugangli
Copy link
Contributor

@sugangli sugangli commented Nov 30, 2021

This PR is for migrating existing direct GCE FW configuration to our firewall CR approach. Only L7 firewall changes are included here. L4 changes will come later. Major changes:

  1. Most of the new CR-related functions are under the second commit. Firewall CRUDs are supported here.
  2. Add firewall informers so that the ingress-gce can watch the status and populate to the related LB services/ingresses. It is mainly for populating the XPN error and its relevant gcloud cmd to the service/ingress's event, but it also serves general service updating purpose.

Here is the algorithm in high-level:

  • ingress-gce creates the CR with pending status
  • Platform Firewall Controller(borg) configures the firewall based on the CRs, and updates the status
  • ingress-gce watches the status, and enqueue in the service/ingress if needed. Then it uses the service/ingress ensuring code path to ensure(read) the firewall CR status and populate the event.
  1. Two flags are added : EnableFirewallCR and EnableFWControllerEnforcement. When the latter is false, ingress-gce configures firewall CRs and gce firewalls at the same time, but Platform FW Controller(borg) will not enforce the CRs. When EnableFWControllerEnforcement is true, we simply drop the direct GCE firewall configurations. We use cluster proto as the source of truth to turn this flag on/off.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 30, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @sugangli. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 30, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @sugangli!

It looks like this is your first PR to kubernetes/ingress-gce 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-gce has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 30, 2021
@sugangli sugangli changed the title WIP: Keep the modified codes of firewall CR without updating the vendor WIP: Firewall CR migration Nov 30, 2021
@swetharepakula
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 8, 2022
@sugangli
Copy link
Contributor Author

sugangli commented Apr 8, 2022

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 8, 2022
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2022
@sugangli sugangli force-pushed the l7-firewall-crd branch 2 times, most recently from 4602de8 to d0fe810 Compare August 25, 2022 00:08
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2022
@sugangli sugangli changed the title WIP: Firewall CR migration Firewall CR migration Aug 25, 2022
@sugangli sugangli force-pushed the l7-firewall-crd branch 3 times, most recently from 3fb41b5 to 46f25a7 Compare January 26, 2023 23:05
pkg/flags/flags.go Outdated Show resolved Hide resolved
Copy link
Member

@swetharepakula swetharepakula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks good to me. I have thought about the two flags, and I think the enforcement flag should be renamed to indicate a behavior in ingress-gce instead of a behavior on an external component.

Thanks Sugang!

pkg/context/context.go Show resolved Hide resolved
@@ -72,7 +74,7 @@ func newLoadBalancerController() *LoadBalancerController {
DefaultBackendSvcPort: test.DefaultBeSvcPort,
HealthCheckPath: "/",
}
ctx := context.NewControllerContext(nil, kubeClient, backendConfigClient, nil, nil, nil, nil, fakeGCE, namer, "" /*kubeSystemUID*/, ctxConfig)
ctx := context.NewControllerContext(nil, kubeClient, backendConfigClient, nil, firewallClient, nil, nil, nil, fakeGCE, namer, "" /*kubeSystemUID*/, ctxConfig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand why this was necessary for the tests? Is there something that is depending on the firewallClient even when the flag is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

cmd/glbc/main.go Outdated
@@ -279,7 +287,7 @@ func runControllers(ctx *ingctx.ControllerContext) {
})
}

fwc := firewalls.NewFirewallController(ctx, flags.F.NodePortRanges.Values())
fwc := firewalls.NewFirewallController(ctx, flags.F.NodePortRanges.Values(), flags.F.EnableFirewallCR, flags.F.EnableFWControllerEnforcement)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this pass in !flags.F.EnableFWControllerEnforcement since the field in the FirewallController is enforceCR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, I renamed it to DisableFWEnforcement. Since they mean the same thing (disable ingress controller FW enforcement = enable FW controller enforcement), I renamed all of them without changing the logic.

@@ -56,20 +57,25 @@ type FirewallController struct {
translator *translator.Translator
nodeLister cache.Indexer
hasSynced func() bool
enableCR bool
enforceCR bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the flag, enforceCR is confusing to understand in the code. Maybe disableFWEnforcement? I am assuming in this case disable will be easier to right the logic for than the other way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above.

pkg/firewalls/controller.go Outdated Show resolved Hide resolved
pkg/firewalls/firewalls.go Outdated Show resolved Hide resolved
if err != nil {
// Create the CR if it is not found.
if api_errors.IsNotFound(err) {
klog.V(3).Infof("ensureFirewallCR Create CR :%+v", expectedFWCR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the others to be like Creating or Created depending on whether the action we are logging has happened or not. If you think the error has enough context, I am fine with this as is.

pkg/firewalls/firewalls.go Outdated Show resolved Hide resolved
klog.V(3).Infof("ensureFirewallCR Create CR :%+v", expectedFW)
_, err = fw.Create(context.Background(), expectedFW, metav1.CreateOptions{})
}
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more wondering if the error by itself has enough context, so whatever is consuming, we should be able to understand in the logs what caused it.

pkg/firewalls/fakes.go Outdated Show resolved Hide resolved
pkg/firewalls/controller.go Show resolved Hide resolved
@@ -116,13 +150,126 @@ func (fr *FirewallRules) Sync(nodeNames, additionalPorts, additionalRanges []str
return fr.updateFirewall(expectedFirewall)
}

// ensureFirewallCR creates/updates the firewall CR
// On CR update, it will read the conditions to see if there are errors updated by PFW controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use full name for 'PFW'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -119,6 +125,29 @@ func NewFirewallController(
},
})

if enableCR {
// FW CRs will be updated/deleted by the PFW controller or the user. Ingress controller need to watch such events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use full name for 'PFW'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/glbc/main.go Outdated
@@ -41,6 +41,7 @@ import (
"k8s.io/client-go/tools/leaderelection"
"k8s.io/client-go/tools/leaderelection/resourcelock"
"k8s.io/client-go/tools/record"
firewallclient "k8s.io/cloud-provider-gcp/crd/client/gcpfirewall/clientset/versioned"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename to firewallcrclient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/glbc/main.go Outdated
@@ -133,6 +134,13 @@ func main() {
}
}

var firewallClient firewallclient.Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firewallCRClient

here and further

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firewallClient would suggest this is provisioning resources on the GCP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cmd/glbc/main.go Outdated
@@ -279,7 +287,12 @@ func runControllers(ctx *ingctx.ControllerContext) {
})
}

fwc := firewalls.NewFirewallController(ctx, flags.F.NodePortRanges.Values())
if !flags.F.EnableFirewallCR {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should panic here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated.

@@ -257,6 +259,8 @@ L7 load balancing. CSV values accepted. Example: -node-port-ranges=80,8080,400-5
flag.BoolVar(&F.EnableMultipleIGs, "enable-multiple-igs", false, "Enable using multiple unmanaged instance groups")
flag.IntVar(&F.MaxIGSize, "max-ig-size", 1000, "Max number of instances in Instance Group")
flag.DurationVar(&F.MetricsExportInterval, "metrics-export-interval", 10*time.Minute, `Period for calculating and exporting metrics related to state of managed objects.`)
flag.BoolVar(&F.EnableFirewallCR, "enable-firewall-cr", false, "Enable generating firewall CR")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not StringVar with 3 valid values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I am fine with either way. But StringVar does not validate the values and we need to check it by ourselves? Plus, the other features flags are using Bool so it might be more consistent?

if enableCR {
// FW CRs will be updated/deleted by the PFW controller or the user. Ingress controller need to watch such events
// and act accordingly.
ctx.FirewallInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.FirewallInformer.AddEventHandler

what we get updates on? Firewall CRs rigth?
when the Platform Firewall Controller changes the CR?
or when customer changes something there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PFW controller will update the status of the CR, and some status requires the ingress controller to take actions. For example, the ingress controller needs to populate gcloud command for the user if there is an XPNPermission error.

if name != curFW.Name {
return
}
fwc.queue.Enqueue(queueKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should be the queueKey?

I am confused why we always use the fake?

	// queueKey is a "fake" key which can be enqueued to a task queue.
	queueKey = &v1.Ingress{
		ObjectMeta: metav1.ObjectMeta{Name: "queueKey"},
	}

@swetharepakula ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we won't have more than one ingress firewall at any given time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch this initially. Even if we will only have one, I think it makes sense to either have a constant for the name of the single firewall CR or to just enqueue the CR from the cur in update func.
My preference is for the later, because I think it makes the most sense.

We should be checking that the firewall cr that we see is the one we expect before adding the queueKey.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 21, 2023
@bowei
Copy link
Member

bowei commented Mar 23, 2023

What is the status of this change?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2023
@sugangli
Copy link
Contributor Author

I refactored L7 firewall pool code path according to what @cezarygerard proposed. Waiting for his comment. We have few more PRs to come after this. But I am only able to spend 20% of my time on this project.

@cezarygerard
Copy link
Contributor

it's LGTM overall

thank you for applying my suggestions

let's just let's just remove this comment

"//// if we had enum-like flag 'crMode' to guard the firewall CR creation the above code would look ~like:"
and I will tag the change

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 12, 2023
@sugangli
Copy link
Contributor Author

/retest

Addressed the comments

Refactor firewall CR into a separate FW pool
@cezarygerard
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cezarygerard, sugangli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2023
@k8s-ci-robot k8s-ci-robot merged commit 7bab346 into kubernetes:master Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants