You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
When server-acl-init runs it creates server ACL tokens and passes them out to servers (https://github.com/hashicorp/consul-k8s/blob/main/control-plane/subcommand/server-acl-init/servers.go#L113-L156). If this fails, e.g. one of the servers is stuck in legacy ACLs, then eventually the server-acl-init Job will timeout (5m). A new Pod will be created since that's the current Job configuration and that Pod will see that there is a bootstrap ACL token and so it will assume that all servers have been correctly bootstrapped and skip the server bootstrap step.
The logs would look something like:
First run
2021-10-28T20:46:25.581Z [INFO] Retrying in 1s
2021-10-28T20:46:26.610Z [INFO] Success: bootstrapping ACLs - PUT /v1/acl/bootstrap
2021-10-28T20:46:26.624Z [INFO] Success: writing bootstrap Secret "consul-bootstrap-acl-token"
2021-10-28T20:46:26.648Z [INFO] Success: creating agent policy - PUT /v1/acl/policy
2021-10-28T20:46:26.659Z [INFO] Success: creating server token for consul-server-0.consul-server.default.svc - PUT /v1/acl/token
2021-10-28T20:46:26.663Z [INFO] Success: updating server token for consul-server-0.consul-server.default.svc - PUT /v1/agent/token/agent
2021-10-28T20:46:26.746Z [INFO] Success: creating server token for consul-server-1.consul-server.default.svc - PUT /v1/acl/token
2021-10-28T20:46:26.749Z [INFO] Success: updating server token for consul-server-1.consul-server.default.svc - PUT /v1/agent/token/agent
2021-10-28T20:46:26.843Z [ERROR] Failure: creating server token for consul-server-2.consul-server.default.svc - PUT /v1/acl/token: err="Unexpected response code: 500 (The ACL system is currently in legacy mode.)"
2021-10-28T20:46:26.843Z [INFO] Retrying in 1s
2021-10-28T20:46:27.846Z [ERROR] Failure: creating server token for consul-server-2.consul-server.default.svc - PUT /v1/acl/token: err="Unexpected response code: 500 (The ACL system is currently in legacy mode.)"
2021-10-28T20:46:27.846Z [INFO] Retrying in 1s
...
2021-10-28T20:53:54.841Z [ERROR] reached command timeout
Community Note
Overview of the Issue
When
server-acl-init
runs it creates server ACL tokens and passes them out to servers (https://github.com/hashicorp/consul-k8s/blob/main/control-plane/subcommand/server-acl-init/servers.go#L113-L156). If this fails, e.g. one of the servers is stuck in legacy ACLs, then eventually theserver-acl-init
Job will timeout (5m). A new Pod will be created since that's the current Job configuration and that Pod will see that there is a bootstrap ACL token and so it will assume that all servers have been correctly bootstrapped and skip the server bootstrap step.The logs would look something like:
First run
Second run:
The text was updated successfully, but these errors were encountered: