Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Etcd configMap to provide consistency and controllability #812

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

anveshreddy18
Copy link
Contributor

@anveshreddy18 anveshreddy18 commented Jun 24, 2024

How to categorize this PR?

/area quality
/kind cleanup
/kind enhancement

What this PR does / why we need it:

This PR is to cleanup the Etcd ConfigMap to :

  • Rename the configMap with the prefix of {etcd.Name}-config instead of having etcd-bootstrap. This is for consistency with other resources that gets created as part of Etcd Reconciliation, like leases, role, rolebinding, etc and also have a proper name config for the configMap instead of bootstrap. This also involves cleanup logic for the existing configMap with older name.
  • Make snapshot-count configurable with the flag snapshotCount from spec.etcd section of etcd CR thus the operator is able to control the no.of transactions before committing a snapshot to disk if they want to.
  • Use proper URLs for initial-advertise-peer-urls and advertise-client-urls properties of the configMap and thereby fixing [Enhancement] Simplify etcd configuration and use proper URL instead of using @ as separator #476

Previously the fields initial-advertise-peer-urls and advertise-client-urls of the etcd ConfigMap were constructed using an @ separator in the format scheme@peerSvcName@namespace@port which is then parsed by the backup-restore container to construct the full URL to be used by the etcd container as its configuration.

This PR aims to modify that to have the fully qualified URL(s) for each etcd member in the ConfigMap itself which then backup-restore parses and sends only it's corresponding etcd member's URL(s) to the etcd process.

The following is the format in the ConfigMap for the fields initial-advertise-peer-urls & advertise-client-urls.

initial-advertise-peer-urls:
  etcd-main-0:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)
  etcd-main-1:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)
  etcd-main-2:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)
advertise-client-urls:
  etcd-main-0:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)
  etcd-main-1:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)
  etcd-main-2:
    - main in-cluster url
    - additional urls if required (e.g. live CPM)

The additional urls above is used for e.g. in CPM in the context of Gardener use case. For now from code we only populate the main in-cluster URL, but in future we will modify the API to allow passing these additional urls from the Etcd Spec which will get populated here and be subsequently parsed and sent to etcd process.

Which issue(s) this PR fixes:
Fixes #717

Special notes for your reviewer:

I've used my personal docker image of backup-restore built on the etcdbr changes required for this to pass the tests. I will remove it once the backup-restore is released.

Release note:

Enhances Etcd configuration by organizing ConfigMap naming convention, enabling snapshot-count configuration, and rectifying URL issues for improved functionality and consistency

@anveshreddy18 anveshreddy18 requested a review from a team as a code owner June 24, 2024 11:05
@gardener-robot gardener-robot added needs/review Needs review area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/cleanup Something that is not needed anymore and can be cleaned up kind/enhancement Enhancement, improvement, extension size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Jun 24, 2024
@anveshreddy18 anveshreddy18 self-assigned this Jun 24, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 24, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 24, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 25, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jun 25, 2024
@gardener-robot gardener-robot added size/s Size of pull request is small (see gardener-robot robot/bots/size.py) and removed size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Sep 2, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 2, 2024
@anveshreddy18
Copy link
Contributor Author

/assign @shreyas-s-rao
/hold

@gardener-robot gardener-robot added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Sep 5, 2024
@renormalize
Copy link
Member

renormalize commented Sep 24, 2024

@anveshreddy18 can you please add gardener/etcd-backup-restore#715 merge and release as a prerequisite to this PR's description? Just so we don't accidentally merge this PR before the changes are etcd-backup-restore are released.

@anveshreddy18
Copy link
Contributor Author

@anveshreddy18 can you please add gardener/etcd-backup-restore#715 merge and release as a prerequisite to this PR's description? Just so we don't accidentally merge this PR before the changes are etcd-backup-restore are released.

Sure, I hope your comment itself is sufficient. But I can add that once this PR is reviewed as we may lose any comment that we put right now because of the reviews. And for now the do-not-merge label is present, I will make sure to keep that label till etcdbr PR is merged and released.

@shreyas-s-rao shreyas-s-rao added this to the v0.24.0 milestone Oct 8, 2024
@gardener-robot gardener-robot added size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Oct 8, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 8, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 8, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 15, 2024
@anveshreddy18
Copy link
Contributor Author

/test pull-etcd-druid-e2e-kind
/test pull-etcd-druid-e2e-kind-nondistroless-etcd

@ishan16696 ishan16696 self-assigned this Oct 17, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 22, 2024
@ishan16696
Copy link
Member

Hi @anveshreddy18 ,
thanks for PR, I have few suggestions:

  1. Can you please rebase this PR on latest master.
  2. Please update this doc: https://github.com/gardener/etcd-druid/blob/master/docs/operations/recovery-from-permanent-quorum-loss-in-etcd-cluster.md with new convention of etcd config map name.

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Oct 24, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 24, 2024
@anveshreddy18
Copy link
Contributor Author

@ishan16696,

  • I've now rebased on master
  • The doc has been updated already previously.

PTAL. Thanks

@ishan16696
Copy link
Member

ishan16696 commented Oct 24, 2024

The doc has been updated already previously.

I'm not sure what you are referring to, doc of recovery from permanent quorum loss is not updated with new name of config map as I'm still able to find the etcd-bootstrap naming convention written in steps of doc.

Sorry for being not clear, this is what I'm referring to:

kubectl get sts etcd-main -o jsonpath='{.spec.template.spec.volumes[?(@.name=="etcd-config-file")].configMap.name}'

this above command was written for etcd-bootstrap, will it work with new convention as well if not then please update this accordingly.

@anveshreddy18
Copy link
Contributor Author

anveshreddy18 commented Oct 25, 2024

above command was written for etcd-bootstrap, will it work with new convention as well

It works for the new config map name as well as we fetch the name from the .spec.volumes. I've also tested this and it works i.e gives the name of the new config map which is currently in use.

@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 25, 2024
Copy link

gardener-prow bot commented Oct 25, 2024

@anveshreddy18: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-druid-e2e-kind 7b3e5fb link true /test pull-etcd-druid-e2e-kind
pull-etcd-druid-e2e-kind-nondistroless-etcd 7b3e5fb link true /test pull-etcd-druid-e2e-kind-nondistroless-etcd

Full PR test history. Your PR dashboard. Command help for this repository.
Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/cleanup Something that is not needed anymore and can be cleaned up kind/enhancement Enhancement, improvement, extension needs/changes Needs (more) changes needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies size/m Size of pull request is medium (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

☂️ Etcd ConfigMap changes
8 participants