Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm + VaultBackend + ACLs | Bootstarp ACLs and store the bootstrapToken and the replicationToken in Vault #1176

Closed
Sushobhan123 opened this issue Apr 17, 2022 · 9 comments · Fixed by #1920
Labels
type/enhancement New feature or request waiting-reply Waiting on the issue creator for a response before taking further action

Comments

@Sushobhan123
Copy link

Sushobhan123 commented Apr 17, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Is your feature request related to a problem? Please describe.

Based on the current documentations, when installing Consul through Helm for a multi-DC federated setup, figuring out the proper way to bootstrap ACLs is seemingly difficult.
This feature will remove the overhead of trying to understand the ACL bootstrap process from the operators and delegate the same to consul-k8s.

Speaking from personal experience, setting up the multi-DC Consul service mesh without ACLs took a lot of time. With the recent features added, the setup became easier and I have reached a stable configuration without the ACLs.
I do appreciate the great effort that went in to make the deployments via Helm charts easier to configure. 👏👏👏

Although, now that I am trying to enable ACLs, while using Consul Helm charts, the documentation is not clear enough. Documentations on the purpose of the bootstrap token and the replication token, along with some info on how they are used under-the-hood (that is, what happens during the process of bootstrapping of multiple DCs in a federated mesh, and where do the bootstrap and replication tokens play roles) will really help.

But for now, if this feature described below is available, operators do not have fret about the ACL bootstrapping and can simply move on to utilise the ACLs directly. Operators can just pass the secret name and key of replication token in the values.yaml for all the secondary DCs. And as for the bootstrap token, once the setup is complete, operators can simply fetch it from Vault in order to further configure the ACLs properly.

Feature Description

In a primary DC, if Vault backend secret names and keys for the bootstrapToken and the replicationToken are provided but their contents are empty, bootstrap the ACLs as if the tokens were not provided, and write the generated tokens to the provided Vault secret paths respectively.

Details:
For the primary DC, if the following is provided in the values.yaml for the Helm chart installation:
Note: Only showing fields as required in the context.

global:
  secretsBackend:
    vault:
      enabled: true
      manageSystemACLsRole: consul-server-acl-init
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: consul/data/secrets/bootstrap-token
      secretKey: token
    createReplicationToken: true
    replicationToken:
      secretName: consul/data/secrets/replication-token
      secretKey: token

but the value of the referred secrets are empty strings, i.e., the secrets have been created with the following set of commands:

vault kv put consul/secrets/boostrap-token token=""
vault kv put consul/secrets/replication-token token=""

then the server-acl-init job should bootstrap the ACLs as if the secret names and keys were not provided at all.


To rephrase, the server-acl-init job should treat the above config the same as the following:

global:
  secretsBackend:
    vault:
      enabled: true
      manageSystemACLsRole: consul-server-acl-init
  acls:
    manageSystemACLs: true
    createReplicationToken: true

Once the bootstrapping is done for the primary DC, the bootstrapToken and the replicationToken generated during the process, should be written to Vault at the provided secret paths respectively. That is, the equivalent of the following code should be executed:

vault kv put consul/secrets/boostrap-token token="${bootstrap_token}"
vault kv put consul/secrets/replication-token token="${replication_token}"

This also means that the role for manageSystemACLsRole: consul-server-acl-init should have write permissions to the 2 Vault secret paths in context. Although, these write permissions will be only required in the primary DC, while for secondary DCs read permissions will be enough.


DC1

vault policy write consul-server-acl-init-dc1-policy - <<-EOF
  path "consul/data/secrets/bootstrap-token" {
    capabilities = ["read", "update"]
  }
  path "consul/data/secrets/replication-token" {
    capabilities = ["read", "update"]
  }
EOF

vault write auth/kubernetes-dc1/role/consul-server-acl-init \
  bound_service_account_names=consul-server-acl-init \
  bound_service_account_namespaces=consul \
  policies=consul-server-acl-init-dc1-policy \
  ttl=1h

DC2

vault policy write consul-server-acl-init-dc2-policy - <<-EOF
  path "consul/data/secrets/replication-token" {
    capabilities = ["read"]
  }
EOF

vault write auth/kubernetes-dc2/role/consul-server-acl-init \
  bound_service_account_names=consul-server-acl-init \
  bound_service_account_namespaces=consul \
  policies=consul-server-acl-init-dc2-policy \
  ttl=1h

vault write auth/kubernetes-dc2/role/consul-server \
  bound_service_account_names=consul-server \
  bound_service_account_namespaces=consul \
  policies=consul-server-acl-init-dc2-policy \
  ttl=1h

Contributions

If the feature seems valid and is approved, I'll be glad to contribute for this and raise a PR.

@Sushobhan123 Sushobhan123 added the type/enhancement New feature or request label Apr 17, 2022
@t-eckert
Copy link
Contributor

@Sushobhan123, thank you for this suggestion. I think it sounds great. @david-yu, what do you think about the feature?

It reminds me of the work done for automating gossip encryption. If you do end up implementing this solution, these two pull requests may be helpful as a reference: #738, #772.

@david-yu
Copy link
Contributor

david-yu commented Apr 18, 2022

Hi @Sushobhan123 we were considering on adding some Consul K8s CLI enhancements to make the process of quickly bootstrapping federated services even simpler.

Traditionally we have not tried to introduce write or update access to such roles that involve Consul K8s making API calls to the Vault secrets backend because creating and updating such secrets should be handled by an operator with escalated privileges and not a long lived process.

I'm also open to seeing what others who watch Consul K8s issues are thinking. We believe implementing automation for bootstrapping secure secrets like the bootstrap and replication token would be better suited in the CLI to improve the UX for the setup of Consul K8s with Vault, given the sensitivity of such secrets in Vault.

@Sushobhan123
Copy link
Author

@t-eckert Thank you for the references. I see that in #738, the work was first done using curl, and then written using Go in #772.

Just for clarity, what I meant by "the equivalent of the vault kv put commands should be executed", is that the equivalent code should be written using Go in control-plane/subcommand/server-acl-init/command.go.

we were considering on adding some Consul K8s CLI enhancements

@david-yu Is the sentence below a correct paraphrasing of the above?
"We were considering on writing the code using Vault Go client, and not curl or vault cli"

creating and updating such secrets should be handled by an operator with escalated privileges and not a long lived process

@david-yu Given that the server-acl-init-job is not a long lived process, if the functionality is added in control-plane/subcommand/server-acl-init/command.go (or a new file under control-plane/subcommand/server-acl-init/, if that is preferred), to handle the vault writes implicitly, shouldn't that be enough?
Or do you mean to introduce a new subcommand, when you say "adding some Consul K8s CLI enhancements"?
Please share your thoughts.

P.S.: I'll be waiting for a Go ahead from you guys before I start any work on this.
Also, if any work is already being done by the team on the same, do let me know. I'll skip it then.

@kholisrag
Copy link

btw, I got an error when try to bootstrap the consul server acl, when using vault as the backend, using latest helm chart 0.43.0
maybe the issue is related to this feature,

global:
  secretsBackend:
    vault:
      enabled: true
      consulServerRole: "consul-server"
      consulClientRole: ""
      manageSystemACLsRole: "consul-server-acl-init"
      agentAnnotations: null
      consulCARole: "consul-ca"
      ca:
        secretName: "vault-ca-cert"
        secretKey: "tls.crt"
      connectCA:
        address: "https://vault-active:8200"
        authMethodPath: "kubernetes"
        rootPKIPath: "connect_root"
        intermediatePKIPath: "connect_inter"
        additionalConfig: |
          {}
  acls:
    bootstrapToken:
      secretName: secret/data/consul/bootstrap-token
      secretKey: token
    createReplicationToken: false
    manageSystemACLs: true
    partitionToken:
      secretName: secret/data/consul/partition-token
      secretKey: token
    replicationToken:
      secretName: secret/data/consul/replication-token
      secretKey: token

it will produce an error like :

2022-04-26T06:12:29.888Z [ERROR] Failure: calling /agent/self to get datacenter: err="Unexpected response code: 403 (ACL not found)"
2022-04-26T06:12:29.888Z [INFO]  Retrying in 1s
2022-04-26T06:12:30.889Z [ERROR] Failure: calling /agent/self to get datacenter: err="Unexpected response code: 403 (ACL not found)"
2022-04-26T06:12:30.889Z [INFO]  Retrying in 1s
2022-04-26T06:12:31.892Z [ERROR] Failure: calling /agent/self to get datacenter: err="Unexpected response code: 403 (ACL not found)"

I guess its because the global.acls.manageSystemACLs still not supported when using vault as a backend like what @Sushobhan123 explain?

anyway, is the any workaround to bootstrap the consul acl when using vault as the secret backend?

@kschoche
Copy link
Contributor

Hi @kholisrag! With the latest release of Consul-k8s 0.43.0 we fully support ACLs being stored in Vault so this should work!

I do not see an obvious error in your values.yaml that you attached, but there are also other things in play like ensuring that the Vault roles/policies are setup correctly which are missing. It should however work!
If you'd like please do file a bug and we'll help get you up and running!

@kholisrag
Copy link

kholisrag commented Apr 26, 2022

yes, its supported, what not supported is bootstraping the consul-server acl when there is fresh install,
for consul server that thr acl already bootstrapped, the vault backend work perfectly.

in my case to fix this, I modify the consul-server-acl-init job to upload the bootstraped acl token to vault, using helm post render and kustomize

@david-yu
Copy link
Contributor

Hi @kholisrag it does look like what you are looking for is related to this feature request. We currently require you to bootstrap the token manually as described here: https://www.consul.io/docs/k8s/installation/vault/data-integration/bootstrap-token. The WAN Federation workflow with the secrets backend is also fully documented here as well: https://www.consul.io/docs/k8s/installation/vault/wan-federation

We were hoping to do more investigations on the CLI side as opposed to building a Kubernetes job like @Sushobhan123 described to create both the ACL bootstrap and replication tokens as that requires an operator to explicitly execute commands that create secure tokens. We would need to do more investigation into the security impact of running such a process in a Kubernetes job before deciding to go in that direction.

@david-yu
Copy link
Contributor

@kholisrag Do you have more details on how you were able to do this? A gist or GitHub repo would be useful to glance at.

in my case to fix this, I modify the consul-server-acl-init job to upload the bootstraped acl token to vault, using helm post render and kustomize

@jmurret jmurret added the waiting-reply Waiting on the issue creator for a response before taking further action label May 18, 2022
@pglass
Copy link
Contributor

pglass commented Feb 17, 2023

Hi @Sushobhan123, I've implemented a fix for this in #1920 (starting with just the bootstrap token). It's been a little while, but I'm interested in any feedback you might have on that. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement New feature or request waiting-reply Waiting on the issue creator for a response before taking further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants