Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonRpcConnection#Send*(): discard messages ASAP once shutting down #9991

Merged
merged 9 commits into from
Sep 3, 2024

Conversation

Al2Klimov
Copy link
Member

Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.

fixes #9985

@cla-bot cla-bot bot added the cla/signed label Feb 7, 2024
@icinga-probot icinga-probot bot added area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working ref/IP labels Feb 7, 2024
@julianbrost
Copy link
Contributor

Now #Disconnect() takes effect ASAP.

Which - with the current state of the PR - means that the messages are discarded instead of enqueued.

Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down.

However, ApiListener#ReplayLog() will still generate messages for a dead connection that will be discarded anyways as it has no way to tell that the connection is dead. How much effort would it take to improve this?

lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
yhabteab
yhabteab previously approved these changes Feb 9, 2024
Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'master2'
Error: Broken pipe [system:32 at /opt/homebrew/include/boost/asio/detail/reactive_socket_send_op.hpp:137:37 in function 'do_complete']

Stacktrace:
 0# __cxa_throw in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 1# void boost::throw_exception<boost::system::system_error>(boost::system::system_error const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 2# boost::asio::detail::do_throw_error(boost::system::error_code const&, boost::source_location const&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
---

[2024-02-09 15:26:01 +0100] information/ApiListener: Endpoint 'master2' disconnected while replaying log.
[2024-02-09 15:26:01 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 15:26:19 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 15:26:19 +0100] information/ApiListener: Replayed 6694 messages.

@yhabteab
Copy link
Member

yhabteab commented Feb 9, 2024

[2024-02-09 17:55:57 +0100] information/ApiListener: Sending config updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'master2' signed by our CA.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: The certificates for CN 'master2' and its root CA are valid and uptodate. Skipping automated renewal.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 17:55:57 +0100] information/ApiListener: Endpoint 'master2' disconnected while syncing runtime objects.
[2024-02-09 17:55:57 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'.

lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab force-pushed the JsonRpcConnection-9985 branch 2 times, most recently from e0bada4 to b6964c3 Compare February 13, 2024 07:59
@yhabteab yhabteab self-assigned this Feb 21, 2024
@yhabteab yhabteab requested review from julianbrost and removed request for julianbrost February 21, 2024 11:09
lib/remote/apilistener-configsync.cpp Outdated Show resolved Hide resolved
lib/remote/apilistener.cpp Outdated Show resolved Hide resolved
lib/remote/jsonrpcconnection.cpp Outdated Show resolved Hide resolved
@yhabteab
Copy link
Member

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

@yhabteab
Copy link
Member

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

Actually, the endpoint is dropped off the list as soon as it is disconnected, so there will be no attempts via the ApiListener::Relay*() methods after that.

@julianbrost
Copy link
Contributor

The implementation should be fine. Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic).

@Al2Klimov: please also review the current state of this PR.

Copy link
Member Author

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.

@@ -163,23 +163,39 @@ ConnectionRole JsonRpcConnection::GetRole() const

void JsonRpcConnection::SendMessage(const Dictionary::Ptr& message)
{
if (m_ShuttingDown) {
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected API client '" + GetIdentity() + "'!"));
Copy link
Member Author

@Al2Klimov Al2Klimov Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big friend of this tbh. All callers catch exceptions – ex. SyncSendMessage() which I didn't finish checking, yet. But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws. Not good. you patched to catch them in 0c7c7ba, to be fair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But so far I see that SyncSendMessage() is called by RelayMessageOne() which is called by SyncRelayMessage(). The latter misses PersistMessage() if RelayMessageOne() throws.

This is never going happen! Nothing has changed in the previous synchronisation behaviour. The only thing that has changed is that we have added an error handling here.

try {
client->SendMessage(message);
} catch (const std::runtime_error& ex) {
Log(LogInformation, "ApiListener")
<< "Error while sending message to endpoint '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I opt for getting the anti-memory-leak ready w/o changing too much for now. It shouldn't depend on the exceptions part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9896 (comment) stated the reason for splitting that PR. Tracing the callers of these two functions now throwing exceptions until one reaches exception handling and verifying that it's in the right place is annoying, but feasible in a rather short amount of time. Currently, when a client disconnects, both CPU generating the messages and RAM storing them is wasted, and your suggestion would only fix one (that was actually one of the first things I noticed when looking at this PR: #9991 (comment)).

@Al2Klimov Al2Klimov added this to the 2.15.0 milestone Apr 5, 2024
@yhabteab
Copy link
Member

Just rebased to fix the GH actions.

@yhabteab yhabteab assigned julianbrost and unassigned yhabteab Aug 28, 2024
@julianbrost
Copy link
Contributor

Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic).

After taking another look at this, I suggest changing the log level of these two messages to notice:

try {
client->SendMessage(lmessage);
} catch (const std::runtime_error& ex) {
Log(LogInformation, "ApiListener")
<< "Error while setting log position for identity '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false);
}

try {
client->SendMessage(message);
} catch (const std::runtime_error& ex) {
Log(LogInformation, "ApiListener")
<< "Error while sending message to endpoint '" << endpoint->GetName() << "': " << DiagnosticInformation(ex, false);
}
}

Rationale:

They catch the following newly introduced exception:

if (m_ShuttingDown) {
BOOST_THROW_EXCEPTION(std::runtime_error("Cannot send message to already disconnected API client '" + GetIdentity() + "'!"));
}

Throwing this exception can only be triggered after calling Disconnect() which itself logs a corresponding message:

void JsonRpcConnection::Disconnect()
{
namespace asio = boost::asio;
if (!m_ShuttingDown.exchange(true)) {
JsonRpcConnection::Ptr keepAlive (this);
IoEngine::SpawnCoroutine(m_IoStrand, [this, keepAlive](asio::yield_context yc) {
Log(LogWarning, "JsonRpcConnection")
<< "API client disconnected for identity '" << m_Identity << "'";

The newly introduced messages provide little extra information (i.e. they basically only repeat that the client is disconnected now, they don't contain a dynamic error that could provide extra information). Additionally, as the code is written, these may be randomly be logged based on timing, like when the timer firing and the client disconnecting happen at the same time. That message then wouldn't indicate a problem but could be wrongly interpreted as such because it mostly doesn't happen when a client disconnects.

Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment should probably have been a review.

@julianbrost julianbrost removed their assignment Aug 28, 2024
@Al2Klimov Al2Klimov merged commit 0951230 into master Sep 3, 2024
26 checks passed
@Al2Klimov Al2Klimov deleted the JsonRpcConnection-9985 branch September 3, 2024 13:13
@yhabteab yhabteab added the backported Fix was included in a bugfix release label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) backported Fix was included in a bugfix release bug Something isn't working cla/signed ref/IP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JsonRpcConnection#Send*() queues messages in memory even after #Disconnect()
3 participants