-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JsonRpcConnection#Send*(): discard messages ASAP once shutting down #9991
Conversation
Which - with the current state of the PR - means that the messages are discarded instead of enqueued.
However, |
3ac0ce0
to
c856ab8
Compare
c856ab8
to
351d4b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'master2'
Error: Broken pipe [system:32 at /opt/homebrew/include/boost/asio/detail/reactive_socket_send_op.hpp:137:37 in function 'do_complete']
Stacktrace:
0# __cxa_throw in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
1# void boost::throw_exception<boost::system::system_error>(boost::system::system_error const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
2# boost::asio::detail::do_throw_error(boost::system::error_code const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
---
[2024-02-09 15:26:01 +0100] information/ApiListener: Endpoint 'master2' disconnected while replaying log.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 15:26:19 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 15:26:19 +0100] information/ApiListener: Replayed 6694 messages.
[2024-02-09 17:55:57 +0100] information/ApiListener: Sending config updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'master2' signed by our CA.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: The certificates for CN 'master2' and its root CA are valid and uptodate. Skipping automated renewal.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 17:55:57 +0100] information/ApiListener: Endpoint 'master2' disconnected while syncing runtime objects.
[2024-02-09 17:55:57 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'. |
30be38e
to
67f5d68
Compare
e0bada4
to
b6964c3
Compare
5ee4133
to
0191ec3
Compare
0191ec3
to
630f201
Compare
630f201
to
3bb5baa
Compare
Yes, depending on the number of disconnected endpoints and on how busy |
Actually, the endpoint is dropped off the list as soon as it is disconnected, so there will be no attempts via the |
The implementation should be fine. Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic). @Al2Klimov: please also review the current state of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.
@@ -163,23 +163,39 @@ ConnectionRole JsonRpcConnection::GetRole() const | |||
|
|||
void JsonRpcConnection::SendMessage(const Dictionary::Ptr& message) | |||
{ | |||
if (m_ShuttingDown) { | |||
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected API client '" + GetIdentity() + "'!")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big friend of this tbh. All callers catch exceptions – ex. SyncSendMessage() which I didn't finish checking, yet. But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws. Not good. you patched to catch them in 0c7c7ba, to be fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws.
This is never going happen! Nothing has changed in the previous synchronisation behaviour. The only thing that has changed is that we have added an error handling here.
icinga2/lib/remote/apilistener.cpp
Lines 1203 to 1208 in 3bb5baa
try { | |
client->SendMessage(message); | |
} catch (const std::runtime_error& ex) { | |
Log(LogInformation, "ApiListener") | |
<< "Error while sending message to endpoint '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.
- Just like Disallow triggering a cancelled downtime #9935 has been splitted off Provide cancel_time in Icinga DB downtime history where has_been_cancelled may be 1 #9896 to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9896 (comment) stated the reason for splitting that PR. Tracing the callers of these two functions now throwing exceptions until one reaches exception handling and verifying that it's in the right place is annoying, but feasible in a rather short amount of time. Currently, when a client disconnects, both CPU generating the messages and RAM storing them is wasted, and your suggestion would only fix one (that was actually one of the first things I noticed when looking at this PR: #9991 (comment)).
by checking the now atomic #m_ShuttingDown outside of it.
Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
3bb5baa
to
b6dba27
Compare
Just rebased to fix the GH actions. |
After taking another look at this, I suggest changing the log level of these two messages to notice: icinga2/lib/remote/apilistener.cpp Lines 1030 to 1035 in b6dba27
icinga2/lib/remote/apilistener.cpp Lines 1203 to 1209 in b6dba27
Rationale: They catch the following newly introduced exception: icinga2/lib/remote/jsonrpcconnection.cpp Lines 166 to 168 in b6dba27
Throwing this exception can only be triggered after calling icinga2/lib/remote/jsonrpcconnection.cpp Lines 203 to 212 in b6dba27
The newly introduced messages provide little extra information (i.e. they basically only repeat that the client is disconnected now, they don't contain a dynamic error that could provide extra information). Additionally, as the code is written, these may be randomly be logged based on timing, like when the timer firing and the client disconnecting happen at the same time. That message then wouldn't indicate a problem but could be wrongly interpreted as such because it mostly doesn't happen when a client disconnects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment should probably have been a review.
b6dba27
to
9f84c15
Compare
Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
fixes #9985