Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix session affinity for canaries #7169

Closed
wants to merge 38 commits into from
Closed

Fix session affinity for canaries #7169

wants to merge 38 commits into from

Conversation

wasker
Copy link
Contributor

@wasker wasker commented May 27, 2021

What this PR does / why we need it:

If canary ingress is configured, session affinity settings no longer apply, which results in requests within the same session being served by both canary and non-canary endpoints. This affects scenarios when backwards-incompatible changes were deployed to canary, and in inconsistent results being served in general.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

The change makes canaries respect session affinity settings by default, as this makes more sense than the current behavior. Nonetheless, there's a way to restore original behavior by adding a special annotation on a non-canary ingress definition, if someone prefers that behavior for some reason.

Which issue/s this PR fixes

Fixes #3717.

How Has This Been Tested?

Used repro steps from the bug to simulate the issue before and after changes.

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@k8s-ci-robot
Copy link
Contributor

Welcome @wasker!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 27, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @wasker. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 27, 2021
@wasker
Copy link
Contributor Author

wasker commented Jun 2, 2021

@cmluciano @ElvinEfendi Can I get eyes on this PR, please?

@wasker
Copy link
Contributor Author

wasker commented Jun 7, 2021

@rikatz Could you help with unblocking this PR, please?

@bowei
Copy link
Member

bowei commented Jun 14, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 14, 2021
@wasker wasker mentioned this pull request Jun 14, 2021
8 tasks
@wasker
Copy link
Contributor Author

wasker commented Jun 14, 2021

/retest

@bowei
Copy link
Member

bowei commented Jun 16, 2021

/assign @rikatz

@rikatz
Copy link
Contributor

rikatz commented Jun 21, 2021

/kind bug
/priority important-soon

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 21, 2021
@wasker
Copy link
Contributor Author

wasker commented Jun 24, 2021

/retest

@wasker
Copy link
Contributor Author

wasker commented Jun 24, 2021

@rikatz FYI - after rebasing onto the latest from master, SSL test cases do not pass + there's an issue with a build. Is this known?

@rikatz
Copy link
Contributor

rikatz commented Jun 27, 2021

Hi @wasker

From a Go perspective your PR looks good to me. I've asked for @ElvinEfendi and @moonming to do a review in Lua part.

To be honest I'm not sure we should change the default behavior (even broken). I think so, but we need to be really careful warning users about that, like "after this release, it will start working as desired but the desired may not be your desired anymore" :)

Let's re-test your tests and I'll keep an eye here, maybe you need to rebase so the latest commits/tests are present in yours as well :)

EDIT: Just saw that you already rebased, let me check here

@rikatz
Copy link
Contributor

rikatz commented Jun 27, 2021

/retest

@rikatz
Copy link
Contributor

rikatz commented Jun 27, 2021

/triage accepted
/cc @moonming @ElvinEfendi

@k8s-ci-robot
Copy link
Contributor

@rikatz: The label(s) triage/approved cannot be applied, because the repository doesn't have them.

In response to this:

/triage approved
/cc @moonming @ElvinEfendi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rikatz
Copy link
Contributor

rikatz commented Jun 27, 2021

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 27, 2021
@wasker
Copy link
Contributor Author

wasker commented Jun 29, 2021

@rikatz I see this and this setup, and yet SSL tests are failing due to "unexpected error storing SSL certificate: could not create PEM certificate file /etc/ingress-controller/ssl/test-1624980835549457600.pem".

Am I missing something? This fails both in PR and locally.

wasker and others added 26 commits July 17, 2021 08:38
* Drop v1beta1 from ingress nginx

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix intorstr logic in controller

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* fixing admission

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* more intorstr fixing

* correct template rendering

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix e2e tests for v1 api

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix gofmt errors

* This is finally working...almost there...

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Re-add removed validation of AdmissionReview
Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does.
* Fix MaxWorkerOpenFiles calculation on high cores nodes

* Add e2e test for rlimit_nofile

* Fix doc for max-worker-open-files
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 17, 2021
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jul 17, 2021

@wasker: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
pull-ingress-nginx-test b5507fb link /test pull-ingress-nginx-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wasker
Copy link
Contributor Author

wasker commented Jul 18, 2021

@rikatz I restarted the change here: #7371

@wasker wasker closed this Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Session affinity doesn't work for canaries