Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Vault service isn't registered in consul. UI not available via vault.service.consul #228

Closed
queglay opened this issue Nov 15, 2020 · 13 comments

Comments

@queglay
Copy link
Contributor

queglay commented Nov 15, 2020

I've been having some trouble being able to access the UI via https://vault.service.consul/ui in a private subnet.

I may be wrong, but I believe the examples showing HA and also the Private subnet example do not register the vault service with consul. So unless you are using an ELB, you wont have the vault.service.consul FQDN to utilise the vault UI.

I am very new to Vault, but I'd imagine this would be problematic for others trying to use a redirect for a private cluster, OIDC, or use the dnsmasq scripts.

I tried to search for where this configuration block for the service registration would be specified in the repo, but couldn't find it:

service_registration "consul"

https://www.vaultproject.io/docs/configuration/service-registration/consul

Is this being specified anywhere or should it be by default? Or is there anywhere in the repo that the vault service is getting registered with consul and I'm missing it?

I also opened a thread on hashicorp discuss yesterday before I realised this might be extending to other functions in this repository as well. https://discuss.hashicorp.com/t/why-might-a-consul-client-not-be-able-to-access-vault-ui-at-https-vault-service-consul-ui/17660

Thanks if anyone can provide any clues!

@queglay
Copy link
Contributor Author

queglay commented Nov 16, 2020

Although I probably shouldn't do it this way, I tested appending the info into the run-vault script, and so far so good, the dig command pulls up an answer, and the web browser does resolve at this address. I'm not sure if more is required to eliminate the https certificate warnings though, I'd like to figure that out.

    vault_storage_backend=$(cat <<EOF
$consul_storage_type "consul" {
  address = "127.0.0.1:8500"
  path    = "vault/"
  scheme  = "http"
  service = "vault"
}
# HA settings
cluster_addr  = "https://$instance_ip_address:$cluster_port"
api_addr      = "$api_addr"

service_registration "consul" {
  address = "127.0.0.1:8500"
  service = "vault"
  scheme  = "http"
}
EOF
)

@brikis98
Copy link
Collaborator

This is probably the same cause as #223.

@brikis98
Copy link
Collaborator

IIRC, using Consul as a backend and specifying a service name, should result in Vault being registered with Consul: https://github.com/hashicorp/terraform-aws-vault/blob/master/modules/run-vault/run-vault#L323. But perhaps some behavior changed to break that?

@brikis98
Copy link
Collaborator

The service registration docs even say:

When Consul is configured as the storage backend, Vault implicitly uses Consul for service registration, so the service_registration stanza is not needed.

So there must be some other issue going on...

@brikis98
Copy link
Collaborator

If someone has time to dig into this and figure out what is going on, a PR is very welcome!

@queglay
Copy link
Contributor Author

queglay commented Nov 16, 2020

I'm a bit hesitant to make a PR of this approach because I also read somewhere in Hashicorp docs that when we use Consul as a storage backend, that the same cluster should not be used for service discovery due to load then being able to influence vault throughput (I cannot remember where). It's possible then that what I've done above to get it working might not be an optimal workflow, but at least for small scale operations perhaps it is fine? I don''t know, but If it was indeed an acceptable solution I would comment with that warning.

@brikis98
Copy link
Collaborator

Yea, I mean a PR that fixes the issue that made service registration stop working... As I wrote above, adding a service_registration is probably not the right solution for that PR.

@queglay
Copy link
Contributor Author

queglay commented Nov 17, 2020

I should add that the vault version I am using to build the AMI's is v1.5.5

    "vault_version": "1.5.5",
    "consul_module_version": "v0.8.0",
    "consul_version": "1.8.4",

@queglay
Copy link
Contributor Author

queglay commented Dec 20, 2020

This may be related to ubuntu 18 only (My vault cluster is using Ubuntu 18). I encountered something here with a client I tried to get going ( hashicorp/terraform-aws-consul#198 ) that made me wonder if its the fact that I see
dig vault.service.consul default to 127.0.0.53 and not produce a result.

ubuntu@ip-10-4-2-183:~$ dig vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> vault.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 20160
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; Query time: 3 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun Dec 20 01:24:46 UTC 2020
;; MSG SIZE  rcvd: 49

ubuntu@ip-10-4-2-183:~$ dig @localhost vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> @localhost vault.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39014
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; ANSWER SECTION:
vault.service.consul.   0       IN      A       10.4.2.183
vault.service.consul.   0       IN      A       10.4.2.46
vault.service.consul.   0       IN      A       10.4.1.247

;; Query time: 456 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Dec 20 02:03:10 UTC 2020
;; MSG SIZE  rcvd: 97

@brikis98
Copy link
Collaborator

brikis98 commented Jan 6, 2021

Yes, we're seeing Ubuntu 18 specific issues here with the DNS code. See #223.

@brikis98
Copy link
Collaborator

This should've been fixed in #232 and released in https://github.com/hashicorp/terraform-aws-vault/releases/tag/v0.14.2.

@queglay
Copy link
Contributor Author

queglay commented May 16, 2021

I updated and tested today, but found that a brand new vault cluster (before being initialised), whilst showing consul services and nodes, retrieved nothing via dig.

Before I was using:
vault-module-version: v0.13.11 vault-version: 1.5.5 consul-module-version: v0.8.0 consul-version: 1.8.4

And I tested today:
vault-module-version: v0.15.1 vault-version: 1.6.1 consul-module-version:v0.8.0 consul-version:1.9.2

What else could I check to further diagnose the problem?

admin:~/environment/firehawk/vault-init (bump-versions) $ ssh [email protected]
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1048-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun May 16 14:44:01 UTC 2021

  System load:  0.0               Processes:           96
  Usage of /:   4.8% of 48.41GB   Users logged in:     0
  Memory usage: 32%               IP address for eth0: 10.1.0.19
  Swap usage:   0%




New release '20.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Sun May 16 14:37:23 2021 from 172.31.0.64
ubuntu@ip-10-1-0-19:~$ consul catalog services
consul
vault
ubuntu@ip-10-1-0-19:~$ dig vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.15-Ubuntu <<>> vault.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 3005
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; Query time: 2 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun May 16 14:44:19 UTC 2021
;; MSG SIZE  rcvd: 49


ubuntu@ip-10-1-0-19:~$ dig @localhost vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.15-Ubuntu <<>> @localhost vault.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 45282
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; AUTHORITY SECTION:
consul.                 0       IN      SOA     ns.consul. hostmaster.consul. 1621176586 3600 600 86400 0

;; Query time: 24 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun May 16 14:49:46 UTC 2021
;; MSG SIZE  rcvd: 99

ubuntu@ip-10-1-0-19:~$ vault status
Key                      Value
---                      -----
Recovery Seal Type       awskms
Initialized              false
Sealed                   true
Total Recovery Shares    0
Threshold                0
Unseal Progress          0/0
Unseal Nonce             n/a
Version                  1.6.1
Storage Type             s3
HA Enabled               true
ubuntu@ip-10-1-0-19:~$ 

I should not that updating did allow my infra to function normally with an existing vault configuration (s3 backend). But this problem became evident when testing from scratch.

@queglay
Copy link
Contributor Author

queglay commented May 17, 2021

It was my mistake, it appears after installing dnsmasq a reboot was required which I never knew about. I added a request to improve the installer to avoid that here (hopefully just some services need to be restarted) -

hashicorp/terraform-aws-consul#224

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants