-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle OSError to properly recycle SSL connection, fix infinite loop #2100
base: master
Are you sure you want to change the base?
Conversation
anything else needed here @dpkp and @jeffwidman ? |
OSError is a very broad exception and has different coverage in different python versions. Can we narrow this down at all? |
@dpkp we can't. I have seen it (not its subclass) thrown in my application and Python source code also suggests that it can be thrown in certain situation. I'm afraid we would have to apply this due to way Python is developed. |
It looks like this is a python bug that has been resolved in recent versions of python: https://bugs.python.org/issue31122 I'd prefer to try to limit this to cases where known buggy versions of python are running, and also to limit it to errno=0 (note the stacktrace |
Could we just add a case to the exception handler in conn.py (_try_handshake.py:513)? I think the self.close here is the key to handling this case properly. Otherwise, it appears that the connection error gets recycled, not the connection itself, leaving us with an endless loop of logging the error.
I've been trying to fix the OSError issue by moving my project from Python 3.6.x to 3.9.6 and it seems to fix it. I'm seeing this trio of log entries in my log where previously it would get into the endless loop OSError issue. I see many of these trios consecutively, separated by typ. < 100 mS. I take this to mean the network got unhealthy for some reason.
I think this means that I'm seeing an SSLEOFError (instead of OSError 0) -- good, but I wish we could make the logs a bit more descriptive So maybe something like this (new error case and more descriptive logging)?
|
https://bugs.python.org/msg375481 notes that the aforementioned bug had a patch propagated into Python 3.8+. Assuming we resume supporting the library for Python 3.8+, we could revisit this if it's still encountered in newer Python versions. Otherwise, we can close this PR. |
…terations for Kafka 0.8.2 and Python 3.12 (dpkp#159) * skip failing tests for PyPy since they work locally * Reconfigure tests for PyPy and 3.12 * Skip partitioner tests in test_partitioner.py if 3.12 and 0.8.2 * Update test_partitioner.py * Update test_producer.py * Timeout tests after ten minutes * Set 0.8.2.2 to be experimental from hereon * Formally support PyPy 3.9
* Test Kafka 0.8.2.2 using Python 3.11 in the meantime * Override PYTHON_LATEST conditionally in python-package.yml * Update python-package.yml * add python annotation to kafka version test matrix * Update python-package.yml * try python 3.10
* Remove support for EOL'ed versions of Python * Update setup.py
Too many MRs to review... so little time.
After stop/start kafka service, kafka-python may use 100% CPU caused by busy-retry while the socket was closed. This fix the issue by unregister the socket if the fd is negative. Co-authored-by: Orange Kao <[email protected]>
Co-authored-by: Ryar Nyah <[email protected]>
Co-authored-by: Denis Otkidach <[email protected]>
The former has been deprecated since setuptools 56 Co-authored-by: micwoj92 <[email protected]>
* docs: Update syntax in README.rst * docs: Update code block syntax in docs/index.rst --------- Co-authored-by: HalfSweet <[email protected]>
* Fix crc32c's __main__ for Python 3 * Remove TODO from _crc32c.py --------- Co-authored-by: Yonatan Goldschmidt <[email protected]>
Co-authored-by: Dave Voutila <[email protected]>
Here's a stack trace we had our logs flooded with.
The problem is Python 3.6 is returning OSError, which is not expected. Such exception is propagated to the caller and code making recycling of such connection is not executed. Therefore, Producer is guaranteed to get the same exception on a next call to
poll()
.Throwing of OSError doesn't seem to be documented even in latest Python docs. See 3.8 docs, but there are signs of it in 3.8 source code.
This change is