Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2019.2.2 Hosts with IPv4 and IPv6 can't talk to IPv4 master #55214

Closed
darkpixel opened this issue Nov 5, 2019 · 25 comments
Closed

2019.2.2 Hosts with IPv4 and IPv6 can't talk to IPv4 master #55214

darkpixel opened this issue Nov 5, 2019 · 25 comments
Assignees
Labels
info-needed waiting for more info
Milestone

Comments

@darkpixel
Copy link
Contributor

Description of Issue

The majority of my hosts are IPv4 only including my master.
I upgraded one host that has both IPv4 and IPv6 running to 2019.2.2.
It can't communicate with the master and it appears to be getting an IPv6 address out of thin air.

Steps to Reproduce Issue

salt-call --master salt.example.tld -l debug state.sls sshkeys

[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: ingest-02.mspids.net
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
cat: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq: No such file or directory
[DEBUG   ] Connecting to master. Attempt 1 of 1
[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
^C
Exiting gracefully on Ctrl-c

The thing I can't figure out is that the IPv6 address listed (7361:6c74:2e6d:7870:7269:6d65:2e6e:6574) is nowhere to be found.
It's not on the minion. There are no IPv6 DNS records for the master.

Versions Report

Salt Version:
           Salt: 2019.2.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.24.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Sep 26 2018, 18:42:22)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1
 
System Versions:
           dist: debian 9.11 
         locale: UTF-8
        machine: x86_64
        release: 4.9.0-9-amd64
         system: Linux
        version: debian 9.11 

Maybe something to do with changes in #54762?

@jeffvoskamp
Copy link

(7361:6c74:2e6d:7870:7269:6d65:2e6e:6574)
is an "ipv6 dump" of "salt.mxtrime.net"

@darkpixel
Copy link
Contributor Author

Is that a typo in your response? That's not what was specified on the command line, nor does that string exist anywhere on the minion.

salt-call --master salt.mxprime.net -l debug state.sls sshkeys

The minion was communicating with the master right up until I ran apt-get update && apt-get upgrade and received 2019.2.2. Neither host has a firewall enabled.

I can telnet from the salt minion to the master on ports 4505 and 4506 and receive data.

# telnet salt.mxprime.net 4505
Trying 107.170.241.41...
Connected to salt.mxprime.net.
Escape character is '^]'.
�^]
telnet> quit
Connection closed.
# telnet salt.mxprime.net 4506
Trying 107.170.241.41...
Connected to salt.mxprime.net.
Escape character is '^]'.
�^]
telnet> quit
Connection closed.
# salt-call -l debug --master salt.mxprime.net state.sls sshkeys
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: i2
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
cat: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq: No such file or directory
[DEBUG   ] Connecting to master. Attempt 1 of 1
[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
^C
Exiting gracefully on Ctrl-c
root@ingest-02:~# 

@Ch3LL
Copy link
Contributor

Ch3LL commented Nov 6, 2019

looks like salt is resolving the dns name to that ipv6 address: [ERROR ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds

do you have ipv6 set in your config by chance?

@Ch3LL Ch3LL added the info-needed waiting for more info label Nov 6, 2019
@Ch3LL Ch3LL added this to the Blocked milestone Nov 6, 2019
@darkpixel
Copy link
Contributor Author

Yes, that's what it looks like, but...

[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
[ERROR   ] DNS lookup or connection check of '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' failed.
[ERROR   ] Master hostname: '7361:6c74:2e6d:7870:7269:6d65:2e6e:6574' not found or not responsive. Retrying in 30 seconds
^C
Exiting gracefully on Ctrl-c
root@ingest-02:~# host salt.mxprime.net
salt.mxprime.net has address 107.170.241.41
root@ingest-02:~# dig -t AAAA salt.mxprime.net

; <<>> DiG 9.10.3-P4-Debian <<>> -t AAAA salt.mxprime.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41963
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;salt.mxprime.net.		IN	AAAA

;; AUTHORITY SECTION:
mxprime.net.		300	IN	SOA	ns.mxprime.net. aaron.heyaaron.com. 2013031461 86400 600 1209600 300

;; Query time: 87 msec
;; SERVER: 67.207.67.3#53(67.207.67.3)
;; WHEN: Wed Nov 06 21:34:32 UTC 2019
;; MSG SIZE  rcvd: 102

root@ingest-02:~#

No IPv6 address for the salt server in DNS. I can't figure out where it's coming from. And I can connect to both ports with telnet.

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 11, 2019

can you share a sanitized version of your config?

@darkpixel
Copy link
Contributor Author

The sshkeys state I was applying in that example is pretty simple:

include:
  - openssh
  - openssh.config
rootkeys_valid:
  ssh_auth.present:
    - user: root
    - enc: ssh-rsa
    - names:
      {% for key in pillar['rootkeys'] %}
      - {{ key }}
      {% endfor %}
{% if pillar.get('rootkeys_absent', []) %}
rootkeys_invalid:
  ssh_auth.absent:
    - user: root
    - enc: ssh-rsa
    - names:
      {% for key in pillar['rootkeys_absent'] %}
      - {{ key }}
      {% endfor %}
{% endif %}

The openssh and openssh.config includes are the saltstack-formulas/openssh-formula repo.

I ended up wiping the box, disabling IPv6 and re-provisioning it since it was under salt anyways... ;)

So no longer have an environment to test or confirm that the issue was resolved. I'd say it can be closed unless someone else can reproduce it.

@darkpixel
Copy link
Contributor Author

I was able to duplicate it on a different machine talking to the same master and gather additional information.

My salt master does have an IPv6 address. It is NOT published in DNS.

salt-call -l debug --master salt.mxprime.net test.ping
[DEBUG   ] Missing configuration file: /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: --redacted--
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Missing configuration file: /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/beacons.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/beacons.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/engine.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/engine.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/f_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/f_defaults.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/reactor.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/reactor.conf
[INFO    ] Executing command '/sbin/zpool list -H -o name,size' in directory '/root'
[DEBUG   ] stdout: rpool	7.25T
[DEBUG   ] output: rpool	7.25T
[DEBUG   ] Connecting to master. Attempt 1 of 1
[DEBUG   ] Master URI: tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'--redacted--', u'tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (2665)
[DEBUG   ] Setting zmq_reconnect_ivl to '2665ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'--redacted--', u'tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506', 'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] Trying to connect to: tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] salt.crypt.get_rsa_pub_key: Loading public key
^C
Exiting gracefully on Ctrl-c

This:

[DEBUG   ] Trying to connect to: tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506

IPv6 address doesn't appear to exist on either the minion or the master.
I'm not sure where Salt is getting it.

Minion:

root@uslogdcnas03:~# salt --versions-report
Salt Version:
           Salt: 2019.2.1
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.24.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: 1.3.7
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Sep 26 2018, 18:42:22)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1
 
System Versions:
           dist: debian 9.9 
         locale: UTF-8
        machine: x86_64
        release: 4.15.18-5-pve
         system: Linux
        version: debian 9.9 

Master:

Salt Version:
           Salt: 2019.2.0
 
Dependency Versions:
           cffi: 1.12.3
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.6.9 (default, Oct 24 2019, 01:18:01)
   python-gnupg: Not Installed
         PyYAML: 5.1
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.1
 
System Versions:
           dist:   
         locale: UTF-8
        machine: amd64
        release: 12.0-RELEASE-p10
         system: FreeBSD
        version: Not Installed

I have 70 hosts connected to this master without any problems.
There is a mix of Debian 9, Debian 10, Ubuntu 18.04, and a bunch of FreeBSD boxes.

I can't find anything unusual about this particular minion that separates it from the rest. It's part of 16 identical boxes (hardware and software) that have a Debian 9 install and are entirely configured by Salt, with the only differences being their IPv4 addresses. They are all behind firewalls that are also configured with Salt and the only difference with the firewalls is their IPv4 private IPs and their IPv4 public IPs. They are all connected via Comcast.

Anything else I can do to debug?

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 19, 2019

apologies i was not clear, can i see your master and minion sanitized config?

@darkpixel
Copy link
Contributor Author

Salt master config:

[root@salt /usr/local/etc/salt]# grep -v '^#' /usr/local/etc/salt/master | grep -v '^$'
interface: 0.0.0.0
ipv6: False
publish_port: 4505
verify_env: True
timeout: 30
show_timeout: True
color: True
strip_colors: False
cli_summary: True
enable_gpu_grains: True
job_cache: True
minion_data_cache: True
worker_threads: 3
state_top: top.sls
state_output: changes
file_roots:
  base:
    - /srv/salt/base
    - /srv/salt/files
    - /srv/salt/formulas/*
top_file_merging_strategy: same
default_top: base
hash_type: sha256
fileserver_backend:
  - roots
  - git
gitfs_ssl_verify: True
gitfs_root: /srv/salt/formulas
pillar_roots:
  base:
    - /srv/salt/pillar
pillar_opts: False
pillar_safe_render_error: True
log_file: /var/log/salt/master
log_level: info
log_level_logfile: info
[root@salt /usr/local/etc/salt]# 

Salt minion config:

root@uslogdcnas03:~# grep -rhv '^#' /etc/salt/minion.d/ | grep -v '^$'
master: "salt.mxprime.net"
root@uslogdcnas03:~# 

@darkpixel
Copy link
Contributor Author

I just double-checked both the master and minion and made sure they had ipv6: false. I restarted the master and minion and then re-ran salt-call and got the same results. It's sorta like it's ignoring the configuration option.

@darkpixel
Copy link
Contributor Author

I've tried deleting everything under /etc/salt/*, /var/cache/salt/*, and /run/salt/* and it still returns this mysterious IPv6 address. ping6 returns connect: Network is unreachable because the minion box doesn't use IPv6 and has no IPv6 routes to the internet.

I've also tried clearing the DNS cache and double-checked from several boxes around the internet that there is indeed no IPv6 record for my salt master. I even went as far as grepping for the IPv6 address under /etc, /var, and /usr. Nothing found.

I've looked on the upstream DNS server and no DNS query ever comes in.
Running wireshark on the salt minion shows the same query going out every time I run salt-call and it's looking up AAAA records its own name and getting a response that there are no AAAA records.

In this screenshot .240 is the minion, and .254 is the DNS server:

Screenshot from 2020-01-07 11-24-30

Showing there are now AAAA records:
Screenshot from 2020-01-07 11-28-38

running strace salt-call -l debug --master salt.mxprime.net test.ping returns nothing interesting.

I'm completely stumped as to where that address is coming from.

@darkpixel
Copy link
Contributor Author

I had a bunch of changes I needed to get to this box since it's in production. The OS is entirely configured by salt and the data resides on a different drive--so I remove salt, purged the packages, nuked /etc/salt, did a bunch of funky find /var -type f -iname '*salt*' -exec rm {} \; type stuff and reinstalled the package. Still no luck. I wiped the box and reinstalled and can't repro the issue.

@darkpixel
Copy link
Contributor Author

Found a third box with this issue running salt 2019.2.1.
I completely purged salt from the system and reinstalled. Same issue.
I completely purged salt and ran find / -iname '*salt*' and noticed /usr/lib/python2.7/dist-packages/salt/ was still hanging around with a bunch of files in it. I rm -rf'd that directory and reinstalled salt. Problem solved.

@darkpixel
Copy link
Contributor Author

Well...ran into another box with the issue.

root@USCMSPRITW01:~# salt-call --versions-report
Salt Version:
           Salt: 2019.2.5
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.27.0
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.17 (default, Apr 15 2020, 17:20:14)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5
 
System Versions:
           dist: LinuxMint 19.1 tessa
         locale: UTF-8
        machine: x86_64
        release: 4.15.0-50-generic
         system: Linux
        version: LinuxMint 19.1 tessa
 
root@USCMSPRITW01:~# salt-call -l debug --master salt.mxprime.net test.ping
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: USCMSPRITW01
[DEBUG   ] Configuration file path: /etc/salt/minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Connecting to master. Attempt 1 of 1
[DEBUG   ] Master URI: tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] Initializing new AsyncAuth for (u'/etc/salt/pki/minion', u'USCMSPRITW01', u'tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506')
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (9596)
[DEBUG   ] Setting zmq_reconnect_ivl to '9596ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for (u'/etc/salt/pki/minion', u'USCMSPRITW01', u'tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506', 'clear')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] Trying to connect to: tcp://[7361:6c74:2e6d:7870:7269:6d65:2e6e:6574]:4506
[DEBUG   ] salt.crypt.get_rsa_pub_key: Loading public key
[DEBUG   ] SaltReqTimeoutError, retrying. (1/7)
^C
Exiting gracefully on Ctrl-c
root@USCMSPRITW01:~# host salt.mxprime.net
salt.mxprime.net has address 107.170.241.41
root@USCMSPRITW01:~# 

There is no ipv6 connectivity at this location.
The salt minion has ipv6: false. Same with the master.
/etc/gai.conf has IPv6 set as a lower precedence, and there isn't an IPv6 entry in DNS or in /etc/hosts.

@darkpixel darkpixel reopened this May 18, 2020
@darkpixel
Copy link
Contributor Author

I've got another box with this issue. It was working fine years. It doesn't appear to be dependent on location (multiple boxes across different states), client (it's occurring on 3 different client networks that are all wildly different), internet connection (one site is Comcast, another is Charter, a third is Wave Broadband).

DNS has no IPv6 record for salt.mxpriem.net.
IPv6 is disabled in the minion config.
The hosts file is clear of any shenanigans.
Changing DNS resolvers doesn't fix it.
Pinging by name works.
Dig and nslookup work properly.
The host command works properly.
The nsswitch config is normal (only files and dns are listed for hosts).
Strangely, removing all entries from /etc/resolv.conf still causes it to find this bizarre IPv6 address when every other service on the box falls over and dies.
Completely purging salt, nuking /etc/salt and /var/cache/salt then reinstalling doesn't fix it.
Purging salt again and wildly trying find / -type f -iname '*salt*' -exec rm {} \; then reinstalling doesn't fix it. I've also made sure to remove all the salt files from under /usr/lib/python and reinstalling.
I've tried "preferring" IPv4 in /etc/gai.conf: precedence ::ffff:0:0/96 100
I tried installing from the saltstack repos instead of the OS packages. I tried salt-bootstrap.

The box receives most of its traffic from a port forward on the firewall, so I disabled it and ran wireshark so I could more easily filter through the traffic. Similar to the screenshot I posted above, it does a AAAA query for salt.mxprime.net, gets NO answers back, doesn't even bother doing an A query, and then magically generates tan IPv6 address out of thin air.

The strange thing about the IPv6 address is that it doesn't seem to matter which box, what OS version, what random DNS server it's pointing to or any of the other info...it always returns that same IPv6 address.

If I run salt-call -l info --master 107.170.241.41 test.ping it works fine. I just can't use the domain name.

I'm debating what I should try next because I'm pretty much stumped at the moment.

I haven't tried switching versions. The box was running 2019.2.3 and I upgraded it to 2019.2.5 and it didn't fix the problem. I'm debating bumping up to 2020.x.

I haven't tried spinning up a completely unrelated DNS zone hosted on a separate provider with different records pointing at 107.170.241.41 to see if it's possibly some bizarre DNS issue....although the packet capture showed nothing in the response.

Every other program can communicate using salt.mxprime.net with no issue. I spun up a temp webserver and curled down a file. I ran netcat and connected to it with no problems.

I'm completely freaking stumped as to where this IPv6 address is coming from...because only salt seems to be affected on this box...

...but here's the real kicker. For fun, I ran salt-call -l info --master dilbert.com test.ping.
It looked up the correct IPv4 address for dilbert.com.

The current box that is having issues is scheduled to be decommissioned within a month, so I'm open to aggressive troubleshooting. I mean I already nuked every file on the box with 'salt' anywhere in the name, so... ;)

@krionbsd krionbsd self-assigned this May 27, 2020
@krionbsd
Copy link
Contributor

Does the following work for you?

salt-call --master IPv4 -l debug state.sls sshkeys

I have a strong feeling your stub resolver on minion with IPv6 tries to resolve master hostname, gets the FQDN from it and tries to connect to IPv6 first, as IPv6 is disabled on master it fails. In order to check it you might run from minion ssh master -vvv and look what type of IP is getting connected.

btw, did you check

dig salt.example.tld @8.8.8.8 any

for AAAA records?

Another thing I'd try is to fire up tcpdump to check exactly what's going on with minion

@darkpixel
Copy link
Contributor Author

Does the following work for you?
salt-call --master IPv4 -l debug state.sls sshkeys

Yeah--if I use the IPv4 address directly, it has no problems communicating with the master.

btw, did you check
dig salt.example.tld @8.8.8.8 any

# dig salt.mxprime.net @8.8.8.8 any +short
107.170.241.41
# 

I have a strong feeling your stub resolver on minion with IPv6 tries to resolve master hostname, gets the FQDN from it and tries to connect to IPv6 first, as IPv6 is disabled on master it fails.

The minion has no IPv6 connectivity.

# ip -6 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 fe80::84bb:d5ff:fef7:a11a/64 scope link 
       valid_lft forever preferred_lft forever
# ip -6 route
fe80::/64 dev eth0  proto kernel  metric 256 
# ssh [email protected] -vvvv
OpenSSH_6.7p1 Debian-5+deb8u8, OpenSSL 1.0.1t  3 May 2016
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 4: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to salt.mxprime.net [107.170.241.41] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
<snip>
The authenticity of host 'salt.mxprime.net (107.170.241.41)' can't be established.
ECDSA key fingerprint is 15:38:ec:fc:96:54:7e:12:43:6c:ed:00:41:0f:30:f2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'salt.mxprime.net,107.170.241.41' (ECDSA) to the list of known hosts.
<snip>
Last login: Wed May 27 04:29:43 2020 from 208.70.52.16
[root@salt ~]# debug2: channel 0: rcvd eof
<snip>
Connection to salt.mxprime.net closed.
Transferred: sent 3696, received 2968 bytes, in 1.1 seconds
Bytes per second: sent 3391.9, received 2723.8
debug1: Exit status 0
# 

Every other tool on the box definitely connects via IPv4.

Another thing I'd try is to fire up tcpdump to check exactly what's going on with minion

I already did. If you see previous comments I posted a screenshot of the DNS traffic from the minion. It definitely requests an AAAA record, but it doesn't get an answer back. It never queries for an A record that I can see.

@krionbsd
Copy link
Contributor

ok, what if you use interface: $IPv4 in your master.conf?

@darkpixel
Copy link
Contributor Author

I tried setting that on the master as well as the source_interface: ip.add.re.ss on the minion. Same result.

@krionbsd
Copy link
Contributor

That's absolutely weird, as well as this custom IPv6 minion tries to connect to. Do you use active directory or multiple IPs for the same hostname? Are you able to check DNS zone?

@darkpixel
Copy link
Contributor Author

I manage the zone mxprime.net. I have a DNS server running BIND. It's an extremely simple zonefile...

$ORIGIN mxprime.net.
$TTL 15m
@       IN      SOA     ns.mxprime.net. aaron.heyaaron.com. (2013031462 1d 10m 2w 5m)

@                       IN      NS      ns.mxprime.net.
@			IN	NS	puck.nether.net.
ns		IN	A	107.170.241.41
salt		IN	A	107.170.241.41
@ 			IN	A	80.77.87.242
www			IN	CNAME	mxprime.net.

No IPv6.

The minion is on a private network that has AD, but it's not pointed at an AD DNS server.
The minion looks up DNS from the router (which is running dnsmasq) which in turn looks up from one of the big resolvers out there (1.1.1.1, 1.0.0.1, 8.8.4.4, and 8.8.8.8).

I've even tried setting resolv.conf to look directly at my BIND server...it still gets that IPv6 address.
The strange part is that I NEVER see a DNS packet go out requesting an 'A' record. I do see a quad-A request though...and it gets a response from DNS that there are no 'AAAA' records for salt.mxprime.net.

You're right...this is baffling.

@sagetherage sagetherage removed this from the Blocked milestone Aug 12, 2020
@sagetherage sagetherage added needs-triage and removed info-needed waiting for more info labels Aug 12, 2020
@AxisNL
Copy link

AxisNL commented Sep 17, 2020

I have the exact same problem after I removed ipv6 from a working setup (moved the VM's to another provider that does not provide me with an ipv6 range):

2020-09-17 17:16:48,021 [salt.utils.network:1878][ERROR ][26450] DNS lookup or connection check of 'salt' failed. 2020-09-17 17:16:48,021 [salt.minion :161 ][ERROR ][26450] Master hostname: 'salt' not found or not responsive. Retrying in 30 seconds 2020-09-17 17:16:48,475 [salt.utils.network:1878][ERROR ][25516] DNS lookup or connection check of 'salt' failed. 2020-09-17 17:16:48,475 [salt.minion :161 ][ERROR ][25516] Master hostname: 'salt' not found or not responsive. Retrying in 30 seconds 2020-09-17 17:16:48,479 [salt.utils.network:1878][ERROR ][25075] DNS lookup or connection check of 'salt' failed. 2020-09-17 17:16:48,480 [salt.minion :161 ][ERROR ][25075] Master hostname: 'salt' not found or not responsive. Retrying in 30 seconds

17:17:02.192547 IP 192.168.128.9.47652 > 192.168.128.2.domain: 3222+ AAAA? salt.hongens.local. (36) 17:17:02.192902 IP 192.168.128.2.domain > 192.168.128.9.47652: 3222* 0/1/0 (93) 17:17:02.192976 IP 192.168.128.9.34751 > 192.168.128.2.domain: 7883+ AAAA? salt. (22) 17:17:02.193145 IP 192.168.128.2.domain > 192.168.128.9.34751: 7883 NXDomain 0/1/0 (97) 17:17:03.366325 IP 192.168.128.9.42575 > 192.168.128.2.domain: 40968+ AAAA? salt.hongens.local. (36) 17:17:03.366645 IP 192.168.128.2.domain > 192.168.128.9.42575: 40968* 0/1/0 (93) 17:17:03.366689 IP 192.168.128.9.51429 > 192.168.128.2.domain: 51213+ AAAA? salt. (22) 17:17:03.366837 IP 192.168.128.2.domain > 192.168.128.9.51429: 51213 NXDomain 0/1/0 (97) 17:17:11.602516 IP 192.168.128.9.37433 > 192.168.128.2.domain: 50817+ AAAA? salt.hongens.local. (36) 17:17:11.602874 IP 192.168.128.2.domain > 192.168.128.9.37433: 50817* 0/1/0 (93) 17:17:11.602943 IP 192.168.128.9.56586 > 192.168.128.2.domain: 31882+ AAAA? salt. (22) 17:17:11.603149 IP 192.168.128.2.domain > 192.168.128.9.56586: 31882 NXDomain 0/1/0 (97)

@sagetherage sagetherage assigned Ch3LL and unassigned krionbsd Sep 22, 2020
@Ch3LL
Copy link
Contributor

Ch3LL commented Sep 23, 2020

Can the issue be reproduced on the latest version of salt?

@Ch3LL Ch3LL added info-needed waiting for more info and removed needs-triage labels Sep 23, 2020
@Ch3LL Ch3LL added this to the Blocked milestone Sep 23, 2020
@AxisNL
Copy link

AxisNL commented Oct 3, 2020

I'm running salt-3000.3-1.el7, which is the latest version in my CentOS7 repo's.

@darkpixel
Copy link
Contributor Author

I'm running salt 3000.x across most of my infrastructure now and haven't run into this in a while. I never found the root cause, but it hasn't happened to me in over a year. Unless someone else is having this problem, I'm going to close this ticket as I can no longer reproduce it to help troubleshoot. Let me know if I need to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info-needed waiting for more info
Projects
None yet
Development

No branches or pull requests

6 participants