Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) #134

Open
louwrentius opened this issue Jul 16, 2024 · 0 comments

Comments

@louwrentius
Copy link

louwrentius commented Jul 16, 2024

Context

  • 1 x log server (VM on Apple M2)
  • 2 x log client (VM on Apple M2)
  • Using systemd-journal-remote on server (http)
  • Using systemd-journal-upload on clients (http)

Testing Method
We first cleanup the server with:

rm /var/log/journal/remote/*

On each client we run (side by side / simultaneously:

for x in {1..10000}; do echo "EVENT $x" | logger ; done

So we expect 20,000 records on the server, which we validate:

root@log-server:/home/ansible# journalctl -D /var/log/journal/remote/ --no-pager | grep -v Boot | wc -l
20000

I've tested this many times to confirm that the systemd-journal-remote/upload mechanism is OK.

The failing code

import select
from systemd import journal

 j = journal.Reader(path="/var/log/journal/remote")
 j.seek_tail()
 j.get_previous() # prevent wrap around

 p = select.poll()
 p.register(j, j.get_events())

 print("Starting....")
 event_counter = 0
 while p.poll():
      for x in j:
          event_counter += 1
      if j.process() != journal.APPEND:
          print("...WAITING....")
      else:
          print(f"Event counter: {event_counter}")

First of all, I'm not sure if this code is right. If I've made a glaring mistake, I apologise in advance for this report.

That said, I get correct results if I run my '10K log generator command' on one client at a time.

Event counter: 9915
Event counter: 9933
Event counter: 9945
Event counter: 9950
Event counter: 9987
Event counter: 10000

If I run the command at the same time on each client at once, I would expect 20K records, but it gets nowhere near that:

Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228
Event counter: 10228

In this example I'm missing almost half of all expected events. Yet I can confirm on the log server with the 'journalctl' command that there are actually 20K log records stored on the server.

Even if I test with just 1K events per client and run that simultaneously it's also clearly visible:

Event counter: 1096
Event counter: 1141
Event counter: 1215
Event counter: 1430

Happy to run any additional tests if that helps.
If I made a huge mistake, please let me know.

Update 1
I've been testing this issue on virtual machines on my mac.
I've also a few physical x86 micro pc's and the issue is the same.
Rate-limiting the logging with sleep .01 doesn't impact the result.

Question, is python-systemd built to handle remote logging from multiple clients through polling?

@louwrentius louwrentius changed the title When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time When monitoring /var/log/journal/remote events are missed when two remote hosts send messages at the same time (updates) Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant