When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

louwrentius · 2024-07-16T22:02:17Z

Context

1 x log server (VM on Apple M2)
2 x log client (VM on Apple M2)
Using systemd-journal-remote on server (http)
Using systemd-journal-upload on clients (http)

Testing Method
We first cleanup the server with:

rm /var/log/journal/remote/*

On each client we run (side by side / simultaneously:

for x in {1..10000}; do echo "EVENT $x" | logger ; done

So we expect 20,000 records on the server, which we validate:

root@log-server:/home/ansible# journalctl -D /var/log/journal/remote/ --no-pager | grep -v Boot | wc -l
20000

I've tested this many times to confirm that the systemd-journal-remote/upload mechanism is OK.

The failing code

import select
from systemd import journal

 j = journal.Reader(path="/var/log/journal/remote")
 j.seek_tail()
 j.get_previous() # prevent wrap around

 p = select.poll()
 p.register(j, j.get_events())

 print("Starting....")
 event_counter = 0
 while p.poll():
      for x in j:
          event_counter += 1
      if j.process() != journal.APPEND:
          print("...WAITING....")
      else:
          print(f"Event counter: {event_counter}")

First of all, I'm not sure if this code is right. If I've made a glaring mistake, I apologise in advance for this report.

That said, I get correct results if I run my '10K log generator command' on one client at a time.

Event counter: 9915
Event counter: 9933
Event counter: 9945
Event counter: 9950
Event counter: 9987
Event counter: 10000

If I run the command at the same time on each client at once, I would expect 20K records, but it gets nowhere near that:

Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228

In this example I'm missing almost half of all expected events. Yet I can confirm on the log server with the 'journalctl' command that there are actually 20K log records stored on the server.

Even if I test with just 1K events per client and run that simultaneously it's also clearly visible:

Event counter: 1096
Event counter: 1141
Event counter: 1215
Event counter: 1430

Happy to run any additional tests if that helps.
If I made a huge mistake, please let me know.

Update 1
I've been testing this issue on virtual machines on my mac.
I've also a few physical x86 micro pc's and the issue is the same.
Rate-limiting the logging with sleep .01 doesn't impact the result.

Question, is python-systemd built to handle remote logging from multiple clients through polling?

The text was updated successfully, but these errors were encountered:

louwrentius changed the title ~~When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time~~ When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

louwrentius commented Jul 16, 2024 •

edited

Loading

When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

Comments

louwrentius commented Jul 16, 2024 • edited Loading

louwrentius commented Jul 16, 2024 •

edited

Loading