Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of [NET-10544] bugfix: catalog sync fails repeatedly to deregister a service in some scenarios into release/1.5.x #4281

Merged

Conversation

hc-github-team-consul-core
Copy link
Collaborator

@hc-github-team-consul-core hc-github-team-consul-core commented Aug 30, 2024

Backport

This PR is auto-generated from #4266 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

The below text is copied from the body of the original PR.


Changes proposed in this PR

  • Stop using blocking queries for listing services for potential reaping
    • This can hurt us when we use blocking queries and no change happens during the wait period, which causes the request to error and retry forever when we actually just needed the existing list of catalog services so that we could diff against the existing list of k8s services
  • Add a limit to the backoff-retry mechanism so that we don't infinitely retry using a bad client
    • This can hurt us when consul-server-connection-manager isn't able to get a valid IP, for example, and the client will be configured with <nil> for the host and retry forever

How I've tested this PR

  1. Install Consul to your k8s cluster with catalog sync enabled and this build for global.imageK8S
  2. Create a k8s Service selecting one or more Pods
  3. Delete the consul-server Pod
  4. Delete the k8s Service
  5. Observe that the corresponding Consul service is now deregistered where it was not before this change
values.yaml
global:
  name: consul
  datacenter: dc1

  imageK8S: consul-k8s-control-plane:dev

  tls:
    enabled: true
    enableAutoEncrypt: true
  
  acls:
    manageSystemACLs: true

  metrics:
    enabled: true

connectInject:
  enabled: false

syncCatalog:
  enabled: true
  metrics:
    enabled: true

prometheus:
  enabled: true

ui:
  enabled: true
  metrics:
    provider: "prometheus"
    baseURL: http://prometheus-server.prometheus

How I expect reviewers to test this PR

See above

Checklist


Overview of commits

@nathancoleman nathancoleman merged commit d12991b into release/1.5.x Aug 30, 2024
49 checks passed
@nathancoleman nathancoleman deleted the backport/resilient-sync/radically-select-weasel branch August 30, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants