Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix network stack stability issues #10080

Draft
wants to merge 8 commits into
base: support/2.14
Choose a base branch
from
Draft

Conversation

lippserd and others added 8 commits June 10, 2024 13:19
While analyzing a possible memory leak, we encountered several coroutine
exception messages, which unfortunately do not provide any information
about what exactly went wrong, as exception diagnostics were previously
only logged at the notice level.
Although there is locking involved here, it shoudln't take too long for
the thread to actually acquire it, since there aren't that many threads
dealing with endpoint clients concurrently. It's just wasting pointless
time trying to obtain a CPU slot.
Closing and re-opening that very same log file shouldn't reset the
counter, otherwise some log files may exceed the max limit per file as
their offset indicator is reset each time they are re-opened.
…ging

On incoming connection timeout we log the remote endpoint which isn't
available if it was already disconnected - an exception is thrown.  Get it
as long as we're still connected not to lose it, nor to get an exception.
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.
@cla-bot cla-bot bot added the cla/signed label Jun 10, 2024
@icinga-probot icinga-probot bot added this to the 2.14.3 milestone Jun 10, 2024
@icinga-probot icinga-probot bot added area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working core/quality Improve code, libraries, algorithms, inline docs ref/IP labels Jun 10, 2024
@Al2Klimov Al2Klimov marked this pull request as draft June 10, 2024 12:38
@Al2Klimov Al2Klimov self-assigned this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working cla/signed core/quality Improve code, libraries, algorithms, inline docs ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants