Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] *: watch tls certificate changes #45

Closed
wants to merge 2 commits into from

Conversation

s-urbaniak
Copy link
Collaborator

@s-urbaniak s-urbaniak commented Jun 17, 2019

TODOs:

  • test on real cluster
  • reload based on a ticker

/cc @brancz @metalmatze @paulfantom

@s-urbaniak
Copy link
Collaborator Author

Testing on an openshift-cluster this looks fine. Noticeable is that those certs are chmod'ed every ~1minute:

$ kubectl -n openshift-monitoring logs -f prometheus-k8s-1 -c kube-rbac-proxy
I0617 14:16:38.672469       1 main.go:146] Reading config file: /etc/kube-rbac-proxy/config.yaml
I0617 14:16:38.673793       1 main.go:238] Reading certificate files
I0617 14:16:38.673815       1 reloader.go:129] reloading key and certificate
I0617 14:16:39.674416       1 main.go:271] Starting TCP socket on 0.0.0.0:9092
I0617 14:16:39.674673       1 main.go:278] Listening securely on 0.0.0.0:9092
I0617 14:16:39.674742       1 reloader.go:74] watching for key/cert changes
I0617 14:16:40.041179       1 reloader.go:87] watcher event: "/etc/tls/private/tls.crt": CHMOD
I0617 14:16:40.041231       1 reloader.go:129] reloading key and certificate
I0617 14:16:41.041977       1 reloader.go:87] watcher event: "/etc/tls/private/tls.key": CHMOD
I0617 14:16:41.042015       1 reloader.go:129] reloading key and certificate
I0617 14:16:42.048058       1 reloader.go:87] watcher event: "/etc/tls/private/tls.crt": CHMOD
I0617 14:16:42.048103       1 reloader.go:129] reloading key and certificate
I0617 14:16:43.048631       1 reloader.go:87] watcher event: "/etc/tls/private/tls.key": CHMOD
I0617 14:16:43.048667       1 reloader.go:129] reloading key and certificate
I0617 14:16:44.049182       1 reloader.go:87] watcher event: "/etc/tls/private/tls.crt": CHMOD
I0617 14:16:44.049211       1 reloader.go:129] reloading key and certificate
I0617 14:16:45.049702       1 reloader.go:87] watcher event: "/etc/tls/private/tls.key": CHMOD
I0617 14:16:45.049741       1 reloader.go:129] reloading key and certificate
I0617 14:17:56.806738       1 reloader.go:87] watcher event: "/etc/tls/private/tls.crt": CHMOD
I0617 14:17:56.806825       1 reloader.go:129] reloading key and certificate
I0617 14:17:57.807360       1 reloader.go:87] watcher event: "/etc/tls/private/tls.key": CHMOD
I0617 14:17:57.807395       1 reloader.go:129] reloading key and certificate

I think we can ignore CHMOD events, but having prometheus config reloader as a reference, I see it is not filtered there too.

@s-urbaniak s-urbaniak changed the title *: watch tls certificate changes [WIP] *: watch tls certificate changes Jun 17, 2019
@s-urbaniak
Copy link
Collaborator Author

setting as WIP, as I am losing fsnotify events after REMOVE events. Viper has a reference implementation for this: spf13/viper#415

@brancz
Copy link
Owner

brancz commented Jun 18, 2019

Let me know when it's ready for review. It may very well be that the config watchers are not 100% correct today. Let's not worry about those right now, and just do it properly here :)

@s-urbaniak
Copy link
Collaborator Author

@brancz agreed 👍 I scanned the prometheus config reloader and I believe it is defunct right now (at least for fsnotify events). I think viper's method is reasonable and that is what I am porting over.

I am finalizing the tests and verifying the design with @lucab. Stay tuned :-)

@s-urbaniak
Copy link
Collaborator Author

@brancz this is ready for an initial pass. We have the following situation:

Simply watching the cert/key paths won't work, because in k8s land an atomic configmap/secret swap is done using a double symlink approach:

..dir_rev1
..data -> ..dir1
tls.crt -> ..data/tls.crt
tls.key -> ..data/tls.key

k8s atomically swaps tls.crt/tls.key by creating dir_rev2 and overwriting the ..data symlink:

..dir_rev2
..data -> ..dir_rev2
tls.crt -> ..data/tls.crt
tls.key -> ..data/tls.key

Since golang's fsnotify library always evaluates symlinks as per https://github.com/fsnotify/fsnotify/blob/c2828203cd70a50dcccfb2761f8b1f8ceef9a8e9/kqueue.go#L192 we only get a REMOVE event for ..dir_rev1/tls.crt and ..dir_rev1/tls.key. As those targets are gone afterwards, we don't receive any follow-up events.

viper's approach linked above does

  1. evaluate the symlink
  2. watch for changes on the whole directory
  3. recognize if the symlink target changed
  4. if so, reload the content

This is a sound approach, but tracking the symlink target evaluation implies state tracking which we can avoid by simply watching the intermediate symlink ..data.

Whenever k8s updates the configmap, a CREATE event is triggered on the ..data symlink, signalling us the necessary event to reload the certificates. We don't need to track state in the watcher.

The only problem is, that we cannot fsnotify on ..data, as golang's fsnotify library evaluates symlinks ;-)

As a workaround, we watch for events in the whole directory (as viper does), and filter out events relating to ..data.

This implementation also includes the possibility to track changes on the certificates directly in case they are simple files and not symlinks for brevity.

case <-ctx.Done():
return nil

case event := <-watcher.Events:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we, in addition should add a ticker anyways. I am not sure we have 100% guarantees to catch all file system events, cc @lucab.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: for this very case, why not just read the files once per minute and apply if they changed? For other cases I agree that reloading as quickly as possible when available makes perfect sense, but here we have a lot of time to apply cert rotation anyways no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good yes, this will simplify a lot of things and we don't have to think about edge cases 👍 let me implement a ticker-only based approach here and reuse the implemented logic here for prometheus-config-reloader.

@s-urbaniak s-urbaniak mentioned this pull request Jun 20, 2019
1 task
s-urbaniak pushed a commit to s-urbaniak/kube-rbac-proxy that referenced this pull request May 10, 2021
@ibihim
Copy link
Collaborator

ibihim commented May 9, 2022

Is this still WIP or can we close it?

@s-urbaniak s-urbaniak closed this May 16, 2022
@s-urbaniak
Copy link
Collaborator Author

this is historic and can be closed

ibihim pushed a commit to ibihim/kube-rbac-proxy that referenced this pull request Oct 20, 2022
Bug 1933599: bump k8s.io/apiserver to 1.20.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants