Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Salt-minion pillars don't always refresh when they should #57180

Closed
edgan opened this issue May 9, 2020 · 3 comments · Fixed by #61189
Closed

[BUG] Salt-minion pillars don't always refresh when they should #57180

edgan opened this issue May 9, 2020 · 3 comments · Fixed by #61189
Labels
Bug broken, incorrect, or confusing behavior Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Milestone

Comments

@edgan
Copy link
Contributor

edgan commented May 9, 2020

Description
Salt-minion pillars don't always refresh when they should.

This is trickery to reproduce and seems to involve a race condition. The steps to reproduce have to be executed very fast.

I am pretty sure from testing that this is caused by changes in this PR, #54942 . I tried reverted the patches and the issue went away.

Steps to Reproduce the behavior
Locally:

cd salt-pillars
mkdir test
echo "base:
  '*':
    - test" > top.sls
echo 'test' > test/init.sls ; git add test/init.sls ; git commit -m 'Testing' ; git push

Master:

/usr/bin/salt-run git_pillar.update ; /usr/bin/salt-run fileserver.update ; salt `hostname -f` state.apply base
test [email protected]:org/salt-grains:
    None
test [email protected]:org/salt-pillars:
    True
True
fqdn:
    Data failed to compile:
----------
    Pillar failed to render with the following messages:
----------
    SLS 'test' does not render to a dictionary

Locally:

echo 'test: a' > test/init.sls ; git add test/init.sls ; git commit -m 'Testing' ; git push

Master:

/usr/bin/salt-run git_pillar.update ; /usr/bin/salt-run fileserver.update ; salt `hostname -f` state.apply base
test [email protected]:org/salt-grains:
    None
test [email protected]:org/salt-pillars:
    True
True
fqdn:
    Data failed to compile:
----------
    Pillar failed to render with the following messages:
----------
    SLS 'test' does not render to a dictionary
salt `hostname -f` saltutil.refresh_pillar
fqdn:
    True
salt]$ salt `hostname -f` state.apply base
fqdn:
----------
          ID: hosts_file
    Function: file.managed
        Name: /etc/hosts
      Result: True
     Comment: File /etc/hosts is in the correct state
     Started: 17:16:12.353157
    Duration: 52.312 ms
     Changes:   

Summary for fqdn
-------------
Succeeded: 1
Failed:     0
-------------
Total states run:     1
Total run time:    1.230 s

Expected behavior
The error on the first state.apply is expected. There is a pillar syntax error. The error on the second state.apply is not expected, because the syntax has been fixed. The git_pillar.update has pulled in new pillars as we can see by the True for salt-pillars. When we then refresh_pillar for that minion we get the correct result we should have gotten on the second state.apply.

Versions Report

Salt Version:
           Salt: 3000.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: unknown
       dateutil: 2.6.1
      docker-py: 2.5.1
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10
        libgit2: 0.26.0
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: 1.4.4
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.4.7
         pygit2: 0.26.2
         Python: 3.6.9 (default, Apr 18 2020, 01:56:04)
   python-gnupg: 0.4.1
         PyYAML: 3.12
          PyZMQ: 16.0.2
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5
 
System Versions:
           dist: Ubuntu 18.04 bionic
         locale: UTF-8
        machine: x86_64
        release: 5.3.0-1017-aws
         system: Linux
        version: Ubuntu 18.04 bionic

@edgan edgan added the Bug broken, incorrect, or confusing behavior label May 9, 2020
@edgan
Copy link
Contributor Author

edgan commented May 9, 2020

I have seen this go both ways, false positives and false negatives.Which makes sense, because it is a caching problem. What makes it really dangerous is you could merge bad pillars, even with testing, because you got a false positive.

@garethgreenaway garethgreenaway added this to the Blocked milestone May 21, 2020
@garethgreenaway garethgreenaway added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label May 21, 2020
@garethgreenaway
Copy link
Contributor

@saltstack/team-core Any ideas on this one?

@colttt
Copy link

colttt commented Jun 22, 2021

I guess it is somehow related to #31907 .. I've the same issue with salt, edit a pillar, push to git, did a refresh (also restart the minion) but it sill doesn't show the (new) pillar value..

salt -V
Salt Version:
          Salt: 3003
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: 2.0.5
     gitpython: 2.1.11
        Jinja2: 2.10
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 17.1.2
         smmap: 2.0.5
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.1
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-16-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

when I run the salt minion in debug mode I got this:
refresh pillar:

[INFO    ] User sudo_BOFH Executing command saltutil.refresh_pillar with jid 20210622082604472251
[DEBUG   ] Command details {'fun': 'saltutil.refresh_pillar', 'arg': [], 'tgt': 'salt-minion-1.bofh.home.de', 'jid': '20210622082604472251', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082604472251 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622082604472251 with PID 25374
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded saltutil.refresh_pillar
[DEBUG   ] LazyLoaded event.fire
[DEBUG   ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pull.ipc
[DEBUG   ] Sending event: tag = pillar_refresh; data = {'_stamp': '2021-06-22T08:26:04.556245'}
[DEBUG   ] Minion of 'salt-test' is handling event tag 'pillar_refresh'
[DEBUG   ] Refreshing modules. Notify=False
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] Minion return retry timer set to 9 seconds (randomized)
[INFO    ] Returning information for job: 20210622082604472251
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': True, 'retcode': 0, 'jid': '20210622082604472251', 'fun': 'saltutil.refresh_pillar', 'fun_args': []}
[DEBUG   ] Override  __utils__: <module 'salt.loaded.int.grains.zfs' from '/usr/lib/python3/dist-packages/salt/grains/zfs.py'>
[DEBUG   ] /etc/resolv.conf: The domain and search keywords are mutually exclusive.
[DEBUG   ] Unable to resolve address fe80::250:56ff:fe8c:1936: [Errno 1] Unknown host
[DEBUG   ] Elapsed time getting FQDNs: 0.10307979583740234 seconds
[DEBUG   ] LazyLoaded zfs.is_supported
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Refreshing pillar.
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Re-using AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] LazyLoaded config.merge
[DEBUG   ] LazyLoaded mine.update
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082604472251 cleaned up
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] Refreshing matchers.
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Refreshing beacons.
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pull.ipc
[DEBUG   ] Sending event: tag = /salt/minion/minion_pillar_refresh_complete; data = {'complete': True, '_stamp': '2021-06-22T08:26:05.363962'}
[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] Minion of 'salt-test' is handling event tag '/salt/minion/minion_pillar_refresh_complete'

ask for pillars: (pillar.items)

[DEBUG   ] LazyLoaded glob_match.match
[INFO    ] User sudo_BOFH Executing command pillar.item with jid 20210622082832819002
[DEBUG   ] Command details {'fun': 'pillar.item', 'arg': [], 'tgt': 'salt-minion-1*', 'jid': '20210622082832819002', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082832819002 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622082832819002 with PID 25467
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded pillar.item
[DEBUG   ] Minion return retry timer set to 8 seconds (randomized)
[INFO    ] Returning information for job: 20210622082832819002
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': {}, 'retcode': 0, 'jid': '20210622082832819002', 'fun': 'pillar.item', 'fun_args': []}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082832819002 cleaned up

ask for specific pillar:

[INFO    ] User sudo_BOFH Executing command pillar.items with jid 20210622083204983233
[DEBUG   ] Command details {'fun': 'pillar.items', 'arg': ['roles'], 'tgt': 'salt-minion-1*', 'jid': '20210622083204983233', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622083204983233 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622083204983233 with PID 25579
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded pillar.items
[DEBUG   ] Minion return retry timer set to 6 seconds (randomized)
[INFO    ] Returning information for job: 20210622083204983233
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': {'roles': ''}, 'retcode': 0, 'jid': '20210622083204983233', 'fun': 'pillar.items', 'fun_args': ['roles']}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622083204983233 cleaned up

actual behavior:

salt "salt-minion-1*" pillar.items roles
salt-minion-1.bofh.home.de:
    ----------
    roles:

expected behavior:

salt "salt-minion-1*" pillar.items roles
salt-minion-1.bofh.home.de:
    ----------
    roles: webserver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants