Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ZeroMQ Transport Error on Timeout/Exception #66006

Closed
cianyleow opened this issue Feb 6, 2024 · 2 comments · Fixed by #66011
Closed

[BUG] ZeroMQ Transport Error on Timeout/Exception #66006

cianyleow opened this issue Feb 6, 2024 · 2 comments · Fixed by #66011
Labels
Bug broken, incorrect, or confusing behavior

Comments

@cianyleow
Copy link
Contributor

cianyleow commented Feb 6, 2024

Description
When sending a message via the ZeroMQ transport in 3006.5, a flurry of ERROR logs are seen if the request times out or an exception occurs during execution.

user@salt-minion:/# salt-call test.ping -lerror
11:54:03.919 [salt.minion      ]:  284 [ 2127] [ERROR   ] No master could be reached or all masters denied the minion's connection attempt.
11:54:03.919 [tornado.application]:  284 [ 2127] [ERROR   ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921de480>, <salt.ext.tornado.concurrent.Future object at 0xffff95340e10>)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 603, in _send_recv
    recv = yield self.socket.recv()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1056, in run
    value = future.result()
            ^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
zmq.eventloop.future.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 605, in _send_recv
    future.set_exception(exc)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
          ^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
    future.set_exception(exc)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
11:54:03.923 [tornado.application]:  284 [ 2127] [ERROR   ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921dede0>, <salt.ext.tornado.concurrent.Future object at 0xffff922119d0>)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 601, in _send_recv
    yield self.socket.send(message)
          ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
          ^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
    future.set_exception(exc)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
11:54:03.925 [tornado.application]:  284 [ 2127] [ERROR   ] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0xffff921df6a0>, <salt.ext.tornado.concurrent.Future object at 0xffff92212650>)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 601, in _send_recv
    yield self.socket.send(message)
          ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'send'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
          ^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/salt/transport/zeromq.py", line 612, in _send_recv
    future.set_exception(exc)
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/usr/lib/python3/dist-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
Unable to sign_in to master: Attempt to authenticate with the salt master failed with timeout error

These messages are not indicative of a functional issue - the response is received by the Salt Minion and it returns as expected (note the last line displaying the clear "timeout error") - but are part of the completion of the coroutine that the message sending/receiving now runs in.

The root cause of these messages is the fact that the error handling code does not check if the future is done (meaning it will have already yielded to the calling code, hence the lack of impact, apart from logging) before attempting to set the exception.

Setup
This was produced in a local environment with a code change to reliably replicate the situation:

diff --git a/salt/channel/server.py b/salt/channel/server.py
index b6d51fef08..17527fde83 100644
--- a/salt/channel/server.py
+++ b/salt/channel/server.py
@@ -9,6 +9,7 @@ import hashlib
 import logging
 import os
 import shutil
+import time

 import salt.crypt
 import salt.ext.tornado.gen
@@ -149,6 +150,7 @@ class ReqServerChannel:
         # intercept the "_auth" commands, since the main daemon shouldn't know
         # anything about our key auth
         if payload["enc"] == "clear" and payload.get("load", {}).get("cmd") == "_auth":
+            time.sleep(30)
             raise salt.ext.tornado.gen.Return(
                 self._auth(payload["load"], sign_messages)
             )

I introduced a 30 second sleep in the channel/server implementation to artificially create a timeout when signing into the Salt Master.

Steps to Reproduce the behavior

  • Salt Master with the above code change
  • Salt Minion configured with auth_timeout: 1 (this is not necessary, just speeds up demonstration of the issue)

Expected behavior
When an exception occurs but the future is already done (meaning the calling process received a response - whatever it may be), the coroutine does not attempt to complete that future again and end up dumping a bunch of spurious ERROR logs.

Versions Report
Note This report was taken with code at the v3006.5 tag (0472fd3) but shows up as 3007.0 below for some

salt --versions-report reason.
Salt Version:
          Salt: 3007.0

Python Version:
        Python: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0]

Dependency Versions:
          cffi: 1.15.1
      cherrypy: Not Installed
      dateutil: 2.8.2
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.3
      M2Crypto: 0.38.0
          Mako: Not Installed
       msgpack: 1.0.3
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 23.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.11.0
        pygit2: Not Installed
  python-gnupg: 0.4.9
        PyYAML: 6.0
         PyZMQ: 24.0.1
        relenv: Not Installed
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: debian 12.2 bookworm
        locale: utf-8
       machine: aarch64
       release: 6.4.16-linuxkit
        system: Linux
       version: Debian GNU/Linux 12.2 bookworm
@cianyleow cianyleow added Bug broken, incorrect, or confusing behavior needs-triage labels Feb 6, 2024
@perfecto25
Copy link
Contributor

noticing similar error trying to start minion on rocky9 vagrant instance

2024-02-06 14:54:34,285 [tornado.application:640 ][ERROR   ][1134] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f88ee3d5820>, <salt.ext.tornado.concurrent.Future object at 0x7f88ee7f5ca0>)
Traceback (most recent call last):
  File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 601, in _send_recv
    yield self.socket.send(message)
AttributeError: 'NoneType' object has no attribute 'send'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
  File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 612, in _send_recv
    future.set_exception(exc)
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
2024-02-06 14:54:34,288 [tornado.application:640 ][ERROR   ][1134] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f88ee3dd040>, <salt.ext.tornado.concurrent.Future object at 0x7f88ee3d4d90>)
Traceback (most recent call last):
  File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 601, in _send_recv
    yield self.socket.send(message)
AttributeError: 'NoneType' object has no attribute 'send'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 606, in _run_callback
    ret = callback()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/stack_context.py", line 278, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/ioloop.py", line 628, in _discard_future_result
    future.result()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/gen.py", line 1070, in run
    yielded = self.gen.send(value)
  File "/opt/salt/lib64/python3.9/site-packages/salt/transport/zeromq.py", line 612, in _send_recv
    future.set_exception(exc)
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 294, in set_exception
    self.set_exc_info(
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 319, in set_exc_info
    self._set_done()
  File "/opt/salt/lib64/python3.9/site-packages/salt/ext/tornado/concurrent.py", line 333, in _set_done
    for cb in self._callbacks:
TypeError: 'NoneType' object is not iterable
root@min2 ~]# salt-minion --versions-report
Salt Version:
          Salt: 3006.6
 
Python Version:
        Python: 3.9.18 (main, Jan  4 2024, 00:00:00)
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: Not Installed
  looseversion: 1.3.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.7
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 23.2
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.20.0
        pygit2: Not Installed
  python-gnupg: Not Installed
        PyYAML: 6.0.1
         PyZMQ: 25.1.2
        relenv: Not Installed
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: rocky 9.3 Blue Onyx
        locale: utf-8
       machine: x86_64
       release: 5.14.0-362.18.1.el9_3.x86_64
        system: Linux
       version: Rocky Linux 9.3 Blue Onyx
 

@dwoz
Copy link
Contributor

dwoz commented Apr 30, 2024

This was released in 3006.7

@dwoz dwoz closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants