Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlt-system: move journal reading to its own thread #471

Merged
merged 1 commit into from
Aug 2, 2023

Conversation

alexmohr
Copy link
Contributor

Reading from journal uses an endless loop which only exits when no new messages are read from the journal.
This is a problem when the local buffer of the application gets filled i.e. due dlt-daemon being busy and not accepting current messages.
In this case the thread waits for 500ms and tries again to read messages from the journal.
When new messages are available they also will be written to dlt. If this cycle continues for a while it is possible that other tasks, which use the same thread and are controlled via a timer, are not executed anymore as the funtion get_journal_msg will not exit. This will lead to termination through the systemd watchdog, as sd_notify is not called anymore.
To circumvent this issue, reading logs from journald is moved into a separate thread.

Furthermore this commit fixes some invalid free when no configuration is passed to dlt-system as well as adding missing initialization which also can lead to crashes.
Although these issues are not responsible for the crash on the system, as they will only happen on application shutdown.

The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0

Alexander Mohr, [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@alexmohr alexmohr force-pushed the dlt-system-performance branch 2 times, most recently from 053b6b1 to 568ecab Compare April 25, 2023 11:41
@minminlittleshrimp
Copy link
Collaborator

Hello @alexmohr
The code change looks good to me, but I could not test from my local.
In case you have some test steps, could you please kindly provide?
Thank you

@alexmohr
Copy link
Contributor Author

alexmohr commented Jul 6, 2023

Hello @alexmohr The code change looks good to me, but I could not test from my local. In case you have some test steps, could you please kindly provide? Thank you

It's not easy to test locally. you need:

  • a huge amount of logs written into system journal
  • a huge amount of logs written into dlt by other services
  • dlt-system must be started via systemd and watchdog (use a small watchdog timeout)

The influx of logs into journal must be larger than the log volume that dlt can handle. I was never able to reproduce this on a desktop / laptop computer as they are way to powerful and have too much memory to really run into that case.
We observed that only on our target platform.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Jul 11, 2023

Hello @alexmohr
Since there are some fixes that applied for our test suites, could you kindly rebase your branch to activate the CI again? It is just to make sure everything is fine.

Hello @michael-methner
I checked through the code change and I think it is OK to let journal reading on own thread.
I also try to run some manual tests in my local and found no issue.
I have no objection in merging this PR.
Could you please kindly have a look and give your opinion on this topic?
Thank you

@minminlittleshrimp minminlittleshrimp self-assigned this Jul 11, 2023
Reading from journal uses an endless loop which only exits
when no new messages are read from the journal.
This is a problem when the local buffer of the application
gets filled i.e. due dlt-daemon being busy and not accepting
current messages.
In this case the thread waits for 500ms and tries again to
read messages from the journal.
When new messages are available they also will be written to dlt.
If this cycle continues for a while it is possible that other tasks,
which use the same thread and are controlled via a timer, are not
executed anymore as the funtion `get_journal_msg` will not exit.
This will lead to termination through the systemd watchdog,
as sd_notify is not called anymore.
To circumvent this issue, reading logs from journald is moved
into a separate thread.

Furthermore this commit fixes some invalid free when no configuration
is passed to dlt-system as well as adding missing initialization which
also can lead to crashes.
Although these issues are not responsible for the crash on the system,
as they will only happen on application shutdown.

Signed-off-by: Alexander Mohr <[email protected]>
@alexmohr
Copy link
Contributor Author

@minminlittleshrimp pipeline ran again and passed :)

@michael-methner michael-methner merged commit 9352f0d into COVESA:master Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants