Skip to content
This repository has been archived by the owner on Jul 16, 2024. It is now read-only.

Add renew certificates subcommand #147

Closed
wants to merge 2 commits into from

Conversation

pytimer
Copy link
Contributor

@pytimer pytimer commented Dec 22, 2019

The etcdadm certs renew support renew certificates with local CA, it will renew the etcd certificates exclude CA.

Now the etcdadm create etcd certificates default expiration time is one year. The cluster not work if time is more than one year. I use kubeadm create Kubernetes cluster and it also have the same question, however kubeadm have renew command to help us make certificates rotation, so i think it is better if etcdadm have this renew command.'

I also found other people have this question, Ref: #56, so i send a PR to add a renew subcommand for implement it.

I'm not sure if missing something, if i missing, please tell me.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 22, 2019
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 22, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @pytimer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 22, 2019
Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd want a little more documentation about how to actually use this. Where do I run it? On the node with the CA? Do I run it on every node? Can I run it on any node? Do I have to copy certs over to other nodes after I run this?

@pytimer
Copy link
Contributor Author

pytimer commented Jan 14, 2020

This command should be run on the certificate expirated node with CA. Before run this command, you should copy CA to the node, and then run renew command, it will renew the etcd certificates on the node.

@chuckha
Copy link
Contributor

chuckha commented Jan 14, 2020

This command should be run on the certificate expirated node with CA. Before run this command, you should copy CA to the node, and then run renew command, it will renew the etcd certificates on the node.

My questions were mostly to help guide some documentation fore this feature. Perhaps a new section in the README? Or maybe even in the command help?

@pytimer
Copy link
Contributor Author

pytimer commented Jan 15, 2020

@chuckha I update the README.md in this pr, you can review it. Thanks.

Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I update a suggestion to potentially clear up what certs are being renewed. I don't think we need to write down how to copy certs around as it's already listed.

Worked for me, but I'm surprised at the lack of cluster level health check/info through etcdadm.

@@ -0,0 +1,46 @@
package cmd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New file needs license header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add license header late.

@@ -60,6 +60,22 @@ On the machine being removed, run
etcdadm reset
```

### Renewal certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a second to understand. We aren't renewing the CA, we're renewing the certs generated with the CA.

this should be a little more friendly, how about something like this:

Certificates are set to expire after <1 year?> by default.
To renew certificates, first ensure every node in the cluster has the CA root certificate. Then, on each node in, run etcdadm certs renew.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @chuckha's suggestion.

I think it might help to explicitly mention that there are two certificates: peer and server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pytimer!

@chuckha
Copy link
Contributor

chuckha commented Jan 15, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 15, 2020
certs/certs.go Outdated
// GetDefaultCertList returns all of the certificates etcdadm requires to function.
func GetDefaultCertList() []string {
return []string{
//constants.EtcdCACertAndKeyBaseName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove it.

certs/certs.go Outdated
}
}

// RenewUsingLocalCA executes certificate renewal using local certificate authorities for generating new certs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RenewUsingLocalCA executes certificate renewal using local certificate authorities for generating new certs.
// RenewUsingLocalCA replaces certificates with new certificates signed by the CA.

func GetDefaultCertList() []string {
return []string{
//constants.EtcdCACertAndKeyBaseName,
constants.EtcdServerCertAndKeyBaseName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does replacing a certificate affect the client that uses the certificates?

  1. etcd peer and server certificates: etcd will check and, if needed, reload the certificate on every connection.
  2. etcdctl client certificate: this is not a long-running process, and presumably etcdctl reads certificates when it starts. If it fails, the user can always rerun it.
  3. **kube-apiserver: I'm not sure if it will reload its etcd client certificate. Does it need to be restarted? **

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Etcd should restart as far as I know.

  2. etcdctl every time use the new certificate s , so I think renew not effect it.

  3. If the kube-apisever client crt changed, I think kube-apisever maybe need to restart. I'm not tests this case, so i'm not sure. I will tests it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

If kube-apiserver does need to be restarted, let's document that. Perhaps even write a message to the user.

}

// extract the certificate config
cfg := certToConfig(cert)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For etcdadm certs renew this means that the source of truth is the certificate on disk.. However, for etcdadm init and etcdadm join, the source of truth is etcdadm.

I think that the source of truth should be consistent across all commands.

In fact, if etcdadm certs renew uses etcdadm as the source of truth, then it can just create new certificates using the existing functions (CreateEtcdServerCertAndKeyFiles, CreateEtcdPeerCertAndKeyFiles, CreateEtcdctlClientCertAndKeyFiles, and CreateAPIServerEtcdClientCertAndKeyFiles).

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that maybe the disk certificates not the source of truth?

I understand your said, But if we use createXXX to recreate the new certificates, is it include CA certificate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that maybe the disk certificates not the source of truth?

That's right; I believe that the certificates on disk should not be the source of truth.

When etcdadm creates certificates the first time, it determines their properties. When it renews them, it should set those properties the same way. It should not pay attention to the properties in the certificates on disk.

I understand your said, But if we use createXXX to recreate the new certificates, is it include CA certificate?

I believe each of the functions I mentioned reads the CA from disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not believe the certificates on disk is right, but if we recreate certificates, this command name still call renew, or change to etcdadm certs recreate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I think renew is correct, based on the RFC:

This subcomponent is used to describe the following elements related to certificate renewal. Certificate renewal means the issuance of a new certificate to the subscriber without changing the subscriber or other participant's public key or any other information in the certificate:
-- https://tools.ietf.org/html/rfc3647 *

However, if I'm reading the definition correctly, etcdadm must generate a new certificate using its current key on disk. That means etcdadm needs to read the key from disk, but not the certificate.

(I continue to believe that etcdadm should set the certificate properties, rather than read them from the certificate on disk.)

* I found this via https://security.stackexchange.com/questions/150907/x-509-certificate-renew-vs-rekey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this renew implementation more like rekey, but it also use certificate properties. If my understanding correct, you mean that etcdadm not use old certificate properties on the disk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; this PR right now implements "rekey" according to the RFC.

I don't know whether we want "renew" or "rekey;" I need to learn the trade-offs and use cases.

I decided to check what kubeadm does. Looks like kubeadm does in fact read the properties of the existing certificates:

Note: alpha certs renew uses the existing certificates as the authoritative source for attributes (Common Name, Organization, SAN, etc.) instead of the kubeadm-config ConfigMap. It is strongly recommended to keep them both in sync.
-- https://v1-17.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#renew-certificates-with-the-kubernetes-certificates-api

I wonder why that is.

Copy link
Contributor Author

@pytimer pytimer Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see kubeadm read certificate properties, i think maybe etcdadm/kubeadm init or join can add custom SANs, if we don't read properties, maybe we missing some SANs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think maybe etcdadm/kubeadm init or join can add custom SANs, if we don't read properties, maybe we missing some SANs.

That's a great point.

@dlipovetsky
Copy link
Contributor

Thanks for helping me understand your approach @pytimer . I'm ok with etcdadm reading the properties from the certs on disk.

The only question left is #147 (comment). Otherwise this looks good to merge.

@pytimer
Copy link
Contributor Author

pytimer commented Feb 8, 2020

The only question left is #147 (comment). Otherwise this looks good to merge.

Sorry, because of have some things, i have no locally Kubernetes cluster before, i will testing and reply it coming soon.

@pytimer
Copy link
Contributor Author

pytimer commented Feb 9, 2020

@dlipovetsky I testing kube-apiserver case about #147 (comment) , kube-apiserver needs to restart and load new etcd client certificates.

I update document and code, warning users to pay attention. The code is: https://github.com/kubernetes-sigs/etcdadm/pull/147/files#diff-1094222d3259d85ad53cc70f8bcc0816R54

cmd/certs.go Outdated
log.Fatalf("Certificates %s can't be renewed.\n", name)
}
}
log.Println("Your etcd certificates has renew successfully!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Println("Your etcd certificates has renew successfully!")
log.Println("The etcd certificates have been renewed successfully!")

cmd/certs.go Outdated
}
}
log.Println("Your etcd certificates has renew successfully!")
log.Warnln("After renew etcd certificates, you need to restart client(kube-apiserver) which use apiserver-etcd-client certificate.")
Copy link
Contributor

@dlipovetsky dlipovetsky Feb 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Warnln("After renew etcd certificates, you need to restart client(kube-apiserver) which use apiserver-etcd-client certificate.")
log.Warnln("If kube-apiserver is running, restart it so that it uses the renewed etcd client certificate.")

@dlipovetsky
Copy link
Contributor

dlipovetsky commented Feb 9, 2020

Worked for me, but I'm surprised at the lack of cluster level health check/info through etcdadm.

@chuckha What did you want etcdadm to do?

Also, what do you think of the PR as it is right now?

@dlipovetsky
Copy link
Contributor

@dlipovetsky I testing kube-apiserver case about #147 (comment) , kube-apiserver needs to restart and load new etcd client certificates.

I update document and code, warning users to pay attention. The code is: https://github.com/kubernetes-sigs/etcdadm/pull/147/files#diff-1094222d3259d85ad53cc70f8bcc0816R54

I know I said earlier that kube-apisever might need to be restarted to reload its etcd client cert. But is that what you see in the logs? The reason I ask is: it looks like kube-apiserver should reload the certificates automatically. That's because kube-apiserver uses the etcd client's TLS config, which implements certificate reloading.

@pytimer
Copy link
Contributor Author

pytimer commented Feb 10, 2020

I know I said earlier that kube-apisever might need to be restarted to reload its etcd client cert. But is that what you see in the logs? The reason I ask is: it looks like kube-apiserver should reload the certificates automatically. That's because kube-apiserver uses the etcd client's TLS config, which implements certificate reloading.

I see kube-apiserver log print error, remote error: tls: bad certificate. I'm not sure kube-apiserver occur this error because of renew certificates, i will continue tests. If someone know, please point out.

@dlipovetsky
Copy link
Contributor

I see kube-apiserver log print error, remote error: tls: bad certificate. I'm not sure kube-apiserver occur this error because of renew certificates, i will continue tests. If someone know, please point out.

I'll look this week, too. It's not a blocking question, but it's important to know. Depending on what we find, we may want to open a kube-apiserver issue.

justinsb added a commit to justinsb/etcdadm that referenced this pull request Aug 28, 2020
Make PKI library richer, including in-memory keypairs
justinsb added a commit to justinsb/etcdadm that referenced this pull request Aug 29, 2020
Make PKI library richer, including in-memory keypairs
justinsb added a commit to justinsb/etcdadm that referenced this pull request Aug 29, 2020
Make PKI library richer, including in-memory keypairs
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 9, 2020
@Avatat
Copy link

Avatat commented Oct 9, 2020

I'm sorry. I will check this feature next week.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

justinsb added a commit to justinsb/etcdadm that referenced this pull request Dec 4, 2020
Make PKI library richer, including in-memory keypairs
@ntaylor1781
Copy link

Can we reopen this? This is a needed feature. I think #179 is a great long term goal, but it will be able to utilize this work

@dlipovetsky
Copy link
Contributor

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Feb 1, 2021
@k8s-ci-robot
Copy link
Contributor

@dlipovetsky: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pytimer
To complete the pull request process, please assign justinsb after the PR has been reviewed.
You can assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ntaylor1781
Copy link

A comment on kube-apiserver. It will pick up the changes to the ETCD client cert, and automatically use it on new connections.

@pytimer
Copy link
Contributor Author

pytimer commented Feb 2, 2021

A comment on kube-apiserver. It will pick up the changes to the ETCD client cert, and automatically use it on new connections.

I use k8s v1.10.0, found kube-apiserver can not automatically use the new client cert, need to restart it, so i add the warning comment when certificates renewed.

@ntaylor1781
Copy link

Good point. The functionality was added in the etcd client version 3.2.0, though there was a bug where it did not work properly when addressed with an IP until 3.2.19. K8S version 1.11 and 1.10 should work if addressed with a domain. Version 1.12 and up will always reload.

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ntaylor1781
Copy link

Can we re-open this and get it reviewed?

@Avatat
Copy link

Avatat commented May 17, 2021

Finally, I tried it and the renew feature works good!

The only problem I have is CSR mode:

# etcdadm certs renew --csr-only
INFO[0000] [renew] Renew etcd server certificate.       
FATA[0000] [renew] Error renew certificate server: failed to write new csr server: server CSR existed but it could not be loaded properly: couldn't load the certificate request /etc/etcd/pki/server.csr: failed to read file: open /etc/etcd/pki/server.csr: no such file or directory

But there is no server.csr file:

# ls
apiserver-etcd-client.crt  apiserver-etcd-client.key  ca.crt  ca.key  etcdctl-etcd-client.crt  etcdctl-etcd-client.key	peer.crt  peer.key  server.crt	server.key

If you want to use this feature, clone @pytimer repo, build from cert branch, and use built binary:

git clone -b cert https://github.com/pytimer/etcdadm.git
cd etcdadm
make container-build
./etcdadm certs

@sfgroups-k8s
Copy link

Hi,
Any plan to include this cert renewal code https://github.com/pytimer/etcdadm/tree/cert to this etcdadm code?

Thanks

ntaylor1781 pushed a commit to ntaylor1781/etcdadm that referenced this pull request Dec 2, 2021
This is continuing the work from kubernetes-retired#147.
Last concern was due to issues with CSR generation. This was due to trying to write the
CSR and key to the certificate directory. When calling TryLoadCSRAndKeyFromDisk it registered
the key existed, and tried to load both. This caused the error stating it couldn't load the
CSR.

As with kubeadm, this needed a directory to output the csr specifically. I added a new argument
of csr-dir (with a default of the local directory). I split the RenewUsingLocalCA function into two
functions. One for the CSRs, and one for the full certificates. The new RenewCSRUsingLocalCA
function uses the new csr-dir path to output the CSRs. This ensures the key, from the cert being
used, is not overwritten/cause the process to fail.
This was referenced Dec 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants