Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If server-acl-init fails to pass out server tokens to every server it will not retry when re-run #825

Closed
lkysow opened this issue Nov 1, 2021 · 0 comments · Fixed by #832
Labels
area/acls Related to ACLs type/bug Something isn't working

Comments

@lkysow
Copy link
Member

lkysow commented Nov 1, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When server-acl-init runs it creates server ACL tokens and passes them out to servers (https://github.com/hashicorp/consul-k8s/blob/main/control-plane/subcommand/server-acl-init/servers.go#L113-L156). If this fails, e.g. one of the servers is stuck in legacy ACLs, then eventually the server-acl-init Job will timeout (5m). A new Pod will be created since that's the current Job configuration and that Pod will see that there is a bootstrap ACL token and so it will assume that all servers have been correctly bootstrapped and skip the server bootstrap step.

The logs would look something like:

First run

2021-10-28T20:46:25.581Z [INFO]  Retrying in 1s
2021-10-28T20:46:26.610Z [INFO]  Success: bootstrapping ACLs - PUT /v1/acl/bootstrap
2021-10-28T20:46:26.624Z [INFO]  Success: writing bootstrap Secret "consul-bootstrap-acl-token"
2021-10-28T20:46:26.648Z [INFO]  Success: creating agent policy - PUT /v1/acl/policy
2021-10-28T20:46:26.659Z [INFO]  Success: creating server token for consul-server-0.consul-server.default.svc - PUT /v1/acl/token
2021-10-28T20:46:26.663Z [INFO]  Success: updating server token for consul-server-0.consul-server.default.svc - PUT /v1/agent/token/agent
2021-10-28T20:46:26.746Z [INFO]  Success: creating server token for consul-server-1.consul-server.default.svc - PUT /v1/acl/token
2021-10-28T20:46:26.749Z [INFO]  Success: updating server token for consul-server-1.consul-server.default.svc - PUT /v1/agent/token/agent
2021-10-28T20:46:26.843Z [ERROR] Failure: creating server token for consul-server-2.consul-server.default.svc - PUT /v1/acl/token: err="Unexpected response code: 500 (The ACL system is currently in legacy mode.)"
2021-10-28T20:46:26.843Z [INFO]  Retrying in 1s
2021-10-28T20:46:27.846Z [ERROR] Failure: creating server token for consul-server-2.consul-server.default.svc - PUT /v1/acl/token: err="Unexpected response code: 500 (The ACL system is currently in legacy mode.)"
2021-10-28T20:46:27.846Z [INFO]  Retrying in 1s
...
2021-10-28T20:53:54.841Z [ERROR] reached command timeout

Second run:

021-10-28T20:54:00.363Z [INFO]  ACLs already bootstrapped - retrieved bootstrap token from Secret "consul-bootstrap-acl-token"
2021-10-28T20:54:00.763Z [INFO]  Success: calling /agent/self to get datacenter
2021-10-28T20:54:00.764Z [INFO]  Current datacenter: datacenter=dc1 primaryDC=dc1
2021-10-28T20:54:00.766Z [INFO]  Policy "agent-token" already exists, updating
2021-10-28T20:54:00.774Z [INFO]  Success: creating agent policy - PUT /v1/acl/policy
2021-10-28T20:54:00.782Z [INFO]  Success: creating cross-namespace-policy policy
2021-10-28T20:54:00.869Z [INFO]  Success: creating client-token policy
2021-10-28T20:54:00.879Z [INFO]  Success: creating token for policy client-token
2021-10-28T20:54:00.887Z [INFO]  Success: writing Secret for token client-token
2021-10-28T20:54:00.894Z [INFO]  Success: creating anonymous token policy - PUT /v1/acl/policy
2021-10-28T20:54:00.903Z [INFO]  Success: updating anonymous token with policy
2021-10-28T20:54:00.963Z [INFO]  Success: getting consul-connect-injector-authmethod-svc-account ServiceAccount
2021-10-28T20:54:00.976Z [INFO]  Success: getting consul-connect-injector-authmethod-svc-account-token-abcd Secret
2021-10-28T20:54:00.987Z [INFO]  Success: creating auth method consul-k8s-auth-method
2021-10-28T20:54:00.989Z [INFO]  Success: listing binding rules for auth method consul-k8s-auth-method
2021-10-28T20:54:00.996Z [INFO]  Success: creating acl binding rule for consul-k8s-auth-method
2021-10-28T20:54:01.006Z [INFO]  Success: creating connect-inject-token policy
2021-10-28T20:54:01.069Z [INFO]  Success: creating token for policy connect-inject-token
2021-10-28T20:54:01.075Z [INFO]  Success: writing Secret for token connect-inject-token
2021-10-28T20:54:01.170Z [INFO]  Success: creating controller-token policy
2021-10-28T20:54:01.180Z [INFO]  Success: creating token for policy controller-token
2021-10-28T20:54:01.187Z [INFO]  Success: writing Secret for token controller-token
2021-10-28T20:54:01.187Z [INFO]  server-acl-init completed successfully
@lkysow lkysow added type/bug Something isn't working area/acls Related to ACLs labels Nov 1, 2021
@lkysow lkysow mentioned this issue Nov 2, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acls Related to ACLs type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant