Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agentx thread #16986

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

fdumontet6WIND
Copy link
Contributor

  • Introduce a dedicated AgentX thread using frr_pthread.

  • Add mutex locks (ax_mtx, ax_io_mtx) to manage thread
    synchronization for trap transfer and I/O operations.

  • Implemented ring buffers (ibuf_ax) for handling
    "master -> AgentX" communication, improving data handling
    between threads.

  • Update the SNMP read operations to use mutex locks to
    ensure thread-safe execution.

  • Integrated a new dedicated thread to send SNMP traps,
    ensuring separation of responsibilities between the main
    and AgentX threads.

  • Enhanced trap handling to support multi-index traps, with
    excess traps being discarded if the buffer is full,
    preventing overflow.

  • Enhanced trap handling to support multi-index traps.
    When more than "RINGBUF_NB_TRAP" traps are pending for
    transmission, subsequent traps are discarded to prevent
    overflow.

This update significantly improves concurrency,
synchronization, and trap management within the AgentX
module, with added protection against socket's buffer overflow
from excessive traps. The socket's buffer overflow is leading
to process deadlock.

add "struct event *" parameter to agentx_event_update function.
the function may be called via event_add*** functions.

Signed-off-by: Francois Dumontet <[email protected]>
Copy link
Contributor

@vjardin vjardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, avoid pulling some dependencies specific for bgpd.

lib/agentx.c Show resolved Hide resolved
lib/agentx.c Outdated
@@ -23,25 +23,62 @@
#include "libfrr.h"
#include "xref.h"
#include "lib/libagentx.h"
#include "ringbuf.h" /* for ringbuf_remain, ringbuf_peek, ringbuf_.. */
#include "frr_pthread.h" /* for struct frr_pthread */
#include "bgpd/bgpd.h" /* for bgp_pth_ax */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a a specific dependency with bgpd ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken into account

@fdumontet6WIND fdumontet6WIND force-pushed the agentx_thread branch 2 times, most recently from fddcf30 to f6efd02 Compare October 2, 2024 19:53
lib/agentx.c Outdated
@@ -332,7 +332,6 @@ static void smux_trap_multi_index_thd(struct event *thread)
return;
}
ringbuf_get(ibuf_ax, &notification_vars, sizeof(notification_vars));
zlog_err("%s get %p %p", __func__, notification_vars, *notification_vars);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you amend this fix into f6efd02?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dine

@ton31337
Copy link
Member

ton31337 commented Oct 3, 2024

Also, could you fix Verify Source, and frrbot issues?

@fdumontet6WIND fdumontet6WIND force-pushed the agentx_thread branch 2 times, most recently from 7557c63 to 8535a1b Compare October 3, 2024 12:45
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time following what problem is being solved here.

It sounds like there can be a problem if there are many traps to be sent - that the existing code is not detecting that, is not discarding events when it begins to overrun a receiver, and/or is not handling EWOULDBLOCK. Is that correct? wouldn't it be simpler to make the code detect and handle those conditions directly -rather than injecting a new pthread and then teaching it to handle the error conditions?

Is there any other problem present? I'm a bit terrified by this line from the description:

"Update the SNMP read operations to use mutex locks to ensure thread-safe execution."

so ... it is not possible to make reads of daemon data "thread-safe" - right? only the main pthread can access daemon internals. what change are you proposing to make there?

lib/agentx.c Outdated
DEFINE_HOOK(agentx_enabled, (), ());

//bool agentx_enabled = false;

static struct event_loop *agentx_tm;
static struct event_loop *main_tm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name this something better? and at least put some documentation into why agentx code has too event_loop data structure pointers. Let's help our future selves out here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taken ento account

@donaldsharp
Copy link
Member

lib/agentx.c:438: error: log message contains tab or newline: "%s not enougth space in ibuf_ax needed : %lu free :\t %lu" (in smux_trap_multi_index())

needs to be fixed

@fdumontet6WIND
Copy link
Contributor Author

tomjstapp
The problem is that, in certain high-load situations, AgentX blocks the entire bgpd process.

We can indeed address some cases by implementing flow control on the traps. Unfortunately, other cases related to the internal mechanisms of AgentX SNMP emerge. It's to provide a global solution, rather than a series of incomplete fixes, that we propose handling the sending of traps in a specific pthread.

@fdumontet6WIND fdumontet6WIND force-pushed the agentx_thread branch 2 times, most recently from 97b75e2 to f03720f Compare October 3, 2024 15:06
@mjstapp
Copy link
Contributor

mjstapp commented Oct 3, 2024

So I'd like to understand that situation better. If the new pthread is wedged somewhere, and a daemon's main pthread is shutting down and waiting for it to stop, how will that work?

tomjstapp The problem is that, in certain high-load situations, AgentX blocks the entire bgpd process.

We can indeed address some cases by implementing flow control on the traps. Unfortunately, other cases related to the internal mechanisms of AgentX SNMP emerge. It's to provide a global solution, rather than a series of incomplete fixes, that we propose handling the sending of traps in a specific pthread.

@fdumontet6WIND
Copy link
Contributor Author

ci:rerun

lib/agentx.c Outdated
/* mutex dedicated to trap transfert between threads */
pthread_mutex_t ax_mtx;
/* mutex dedicated to send/read exclusion */
pthread_mutex_t ax_io_mtx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't they be static ? both ax_mtx and ax_io_mtx.

Copy link
Contributor

@vjardin vjardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor update (static) that would be welcomed.

@vjardin
Copy link
Contributor

vjardin commented Oct 7, 2024

Thanks for the static that are being added. ;)

@fdumontet6WIND
Copy link
Contributor Author

sorry for not having notified the change.

- Introduce a dedicated AgentX thread using `frr_pthread`.

- Add mutex locks (`ax_mtx`, `ax_io_mtx`) to manage thread
synchronization for trap transfer and I/O operations.

- Implemented ring buffers (`ibuf_ax`) for handling
"master -> AgentX" communication, improving data handling
between threads.

- Update the SNMP read operations to use mutex locks to
ensure thread-safe execution.

- Integrated a new dedicated thread to send SNMP traps,
ensuring separation of responsibilities between the main
and AgentX threads.

- Enhanced trap handling to support multi-index traps, with
excess traps being discarded if the buffer is full,
preventing overflow.

- Enhanced trap handling to support multi-index traps.
When more than "RINGBUF_NB_TRAP" traps are pending for
transmission, subsequent traps are discarded to prevent
overflow.

This update significantly improves concurrency,
synchronization, and trap management within the AgentX
module, with added protection against socket's buffer overflow
from excessive traps. The socket's buffer overflow is leading
to process deadlock.

Signed-off-by: Francois Dumontet <[email protected]>
@vjardin
Copy link
Contributor

vjardin commented Oct 8, 2024

Please @fdumontet6WIND , do you have some data on the improvements you are getting ?

LGTM

@fdumontet6WIND
Copy link
Contributor Author

added some comments following Donald sharp advice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants