Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent-plugin-systemd fails with SIGABORT on Ubuntu 21.04 #369

Open
scrwr opened this issue Feb 9, 2022 · 12 comments
Open

fluent-plugin-systemd fails with SIGABORT on Ubuntu 21.04 #369

scrwr opened this issue Feb 9, 2022 · 12 comments
Labels
bug Something isn't working
Milestone

Comments

@scrwr
Copy link

scrwr commented Feb 9, 2022

When using fluent-plugin-systemd the worker crashes hard with a SIGABRT. Initially we assumed it to be a problem with the plugin, but it turned out to be related to libjemalloc. After removing

Environment=LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so

from the service, crashes are gone.

See ledbettj/systemd-journal#93 for more details.

We were using td-agent 3 in the above example, but the issue is the same with td-agent 4.

Some more info:

Related config part:

<source>
  @type systemd
  tag systemd
  path /var/log/journal
  <storage>
    @type local
    persistent true
    path /var/tmp/fluentd_systemd
  </storage>
  <entry>
    fields_strip_underscores true
    fields_lowercase true
  </entry>
</source>

Let me know, in case I can help with further details.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label May 11, 2022
@github-actions
Copy link

This issue was automatically closed because of stale in 30 days

@fujimotos fujimotos reopened this Oct 12, 2022
@daipom
Copy link
Contributor

daipom commented Oct 12, 2022

Seems like this problem still exists.

Environment

  • Ubuntu 22.04
  • td-agent 4.4.1 fluentd 1.15.2 (c32842297ed2c306f1b841a8f6e55bdd0f1cb27f)
    • Installed by $ curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-td-agent4.sh | sh

How to Reproduce

  • Install fluent-plugin-systemd plugin: $ td-agent-gem install fluent-plugin-systemd
  • Add the following setting
<source>
  @type systemd
  tag debug
  path /var/log/journal
  read_from_head true
</source>
  • $ (sudo) adduser td-agent systemd-journal
  • $ (sudo) systemctl restart td-agent

Result

  • After reading one record, then the worker dies with SIGABRT.
2022-10-12 05:58:38 +0000 [info]: #0 fluentd worker is now running worker=0
2022-10-12 04:50:35.083947000 +0000 debug: {"SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd-journald","_TRANSPORT":"driver","PRIORITY":"6","MESSAGE_ID":"f77379a8490b408bbe5f6940505a777b","MESSAGE":"Journal started","_PID":"57","_UID":"0","_GID":"0","_COMM":"systemd-journal","_EXE":"/usr/lib/systemd/systemd-journald","_CMDLINE":"/lib/systemd/systemd-journald","_CAP_EFFECTIVE":"25402800cf","_SELINUX_CONTEXT":"unconfined\n","_SYSTEMD_CGROUP":"/system.slice/systemd-journald.service","_SYSTEMD_UNIT":"systemd-journald.service","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_INVOCATION_ID":"ad56283776054be3859ad9b4e1f962d5","_BOOT_ID":"eb783cbf1e3c47c0a680a80b99e356d9","_MACHINE_ID":"6981994dead5402094f9195aec951d36","_HOSTNAME":"jammy-td-agetn"}
2022-10-12 05:58:39 +0000 [error]: Worker 0 finished unexpectedly with signal SIGABRT

ETC

This doesn't reproduce on Ubuntu 20.04.

@daipom
Copy link
Contributor

daipom commented Oct 12, 2022

As @scrwr says, we can avoid this issue by commenting out the following line in /lib/systemd/system/td-agent.service

Environment=LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so

Then apply this.

$ (sudo) systemctl daemon-reload
$ (sudo) systemctl restart td-agent

However, is it correct to comment out this?

@daipom
Copy link
Contributor

daipom commented Oct 12, 2022

However, is it correct to comment out this?

It is not recommended to edit directly /lib/systemd/system/td-agent.service.
The correct way to change environment variables would be as follows for Ubuntu:

  • Edit /etc/default/td-agent and add the following line:
LD_PRELOAD=
  • Then restart td-agent: $ (sudo) systemctl restart td-agent

My concern here was the effect of omitting this environment variable, but it seems that if memory usage is not a problem, this environment variable can be omitted.

Thus, for now, this seems to be a workaround.

@github-actions github-actions bot removed the stale label Oct 12, 2022
@fujimotos fujimotos self-assigned this Oct 17, 2022
@fujimotos fujimotos added the bug Something isn't working label Oct 17, 2022
@fujimotos fujimotos changed the title SIGABRT due to jemalloc fluent-plugin-systemd fails with SIGABORT on Ubuntu 21.04 Oct 17, 2022
@mszabo
Copy link

mszabo commented Mar 30, 2023

having the same issue. workaround helped, but I wonder if there is any progress on a permanent fix?

@daipom
Copy link
Contributor

daipom commented Mar 31, 2023

I think there is no progress. We still need this workaround for fluent-plugin-systemd in some environments.

@mszabo Could you share your environment information? Are you using Ubuntu?

@mszabo
Copy link

mszabo commented Apr 11, 2023

@daipom yes, issue surfaced when we started to migrate to the latest ubuntu LTS. (22.04).

@daipom
Copy link
Contributor

daipom commented Apr 12, 2023

Thanks!

@fesia
Copy link

fesia commented Oct 5, 2023

Hello,

Migrated my app from RHEL 8.8 to RHEL 9.2 and started experiencing the same issue:

2023-10-05 11:06:29 +0200 [error]: Worker 0 exited unexpectedly with signal SIGABRT

The workaround with unsetting LD_PRELOAD var helped. Posting my env info in case it may help with the permanent fix.

  • OS: RHEL 9.2
  • kernel version 5.14.0-284.30.1
  • td-agent package version 4.5.1-1
  • ruby version 3.1.4p223 (bundled with td-agent)
  • fluent-plugin-systemd gem version 1.0.5
  • systemd-journal gem version 1.4.2

Thanks,
Andrii

kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
daipom pushed a commit to fluent/fluentd-kubernetes-daemonset that referenced this issue Jul 18, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
@ashie
Copy link
Member

ashie commented Aug 30, 2024

ledbettj/systemd-journal#96 will fix this issue.

@kenhys
Copy link
Contributor

kenhys commented Sep 27, 2024

NOTE:

fluent-package v5.0.4/v5.1.0 still bundles fluent-plugin-systemd 1.0.5,
we should bundle fluent-plugin-systemd 1.1.0 or later.

@kenhys kenhys added this to the 5.0.5 (T.B.D.) milestone Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Work-In-Progress
Development

No branches or pull requests

7 participants