Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on Ubuntu 22.04 #110

Closed
kthhrv opened this issue Jan 15, 2024 · 7 comments
Closed

Segmentation fault on Ubuntu 22.04 #110

kthhrv opened this issue Jan 15, 2024 · 7 comments

Comments

@kthhrv
Copy link

kthhrv commented Jan 15, 2024

Works on 20.04 but on 22.04 fluentd silently fails.

I'm using latest fluentd

# apt show fluent-package
Package: fluent-package
Version: 5.0.2-1

running fluentd manually with

# FLUENT_SOCKET=/var/run/fluent/fluentd.sock FLUENT_PLUGIN=/etc/fluent/plugin LD_PRELOAD=/opt/fluent/lib/libjemalloc.so GEM_PATH=/opt/fluent/lib/ruby/gems/3.2.0/ GEM_HOME=/opt/fluent/lib/ruby/gems/3.2.0/ FLUENT_CONF=/etc/fluent/fluentd.conf /opt/fluent/bin/fluentd

I get

/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325: [BUG] Segmentation fault at 0x0000000000000008
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0013 p:---- s:0064 e:000063 CFUNC  :free
c:0012 p:0012 s:0059 e:000058 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
c:0011 p:0042 s:0053 e:000052 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
c:0010 p:0018 s:0047 e:000044 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
c:0009 p:0013 s:0040 e:000039 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144
c:0008 p:0032 s:0034 e:000033 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121 [FINISH]
c:0007 p:---- s:0030 e:000029 IFUNC 
c:0006 p:0012 s:0027 e:000026 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80 [FINISH]
c:0005 p:---- s:0022 e:000021 CFUNC  :run_once
c:0004 p:0034 s:0017 e:000016 METHOD /opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88
c:0003 p:0026 s:0012 e:000011 BLOCK  /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93
c:0002 p:0050 s:0008 e:000007 BLOCK  /opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78 [FINISH]
c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run'
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88:in `run_once'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121:in `run'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144:in `watch'
/opt/fluent/lib/ruby/gems/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151:in `yield_current_entry'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13:in `cursor'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325:in `read_and_free_outstr'
/opt/fluent/lib/ruby/gems/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325:in `free'

-- Machine register context ------------------------------------------------
 RIP: 0x00007f88714a2e50 RBP: 0x00007f8866c267f0 RSP: 0x00007f8864cf8b90
 RAX: 0x0000000000000000 RBX: 0x00007f886a574b40 RCX: 0x000007f886a574b2
 RDX: 0x0000000000000001 RDI: 0x00007f887161ac80 RSI: 0x00007f885c0008e0
  R8: 0x00000000ffffffff  R9: 0x00007f885c0008e0 R10: 0x0000000000004022
 R11: 0x00007f885c000090 R12: 0x00007f887161ac80 R13: 0x0000000000000000
 R14: 0x0000ff10d119b330 R15: 0x00007f886b32a110 EFL: 0x0000000000010246

-- C level backtrace information -------------------------------------------
/opt/fluent/lib/libruby.so.3.2(0x7f8871ac1539) [0x7f8871ac1539]
/opt/fluent/lib/libruby.so.3.2(0x7f8871ac5869) [0x7f8871ac5869]
/opt/fluent/lib/libruby.so.3.2(0x7f887191e384) [0x7f887191e384]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a3e65d) [0x7f8871a3e65d]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871442520) [0x7f8871442520]
/lib/x86_64-linux-gnu/libc.so.6(0x7f88714a2e50) [0x7f88714a2e50]
/lib/x86_64-linux-gnu/libc.so.6(free+0x73) [0x7f88714a5453]
/lib/x86_64-linux-gnu/libffi.so.8(0x7f886b328e2e) [0x7f886b328e2e]
/lib/x86_64-linux-gnu/libffi.so.8(0x7f886b325493) [0x7f886b325493]
/opt/fluent/lib/ruby/gems/3.2.0/gems/ffi-1.15.5/lib/ffi_c.so(rbffi_CallFunction+0x1af) [0x7f886976229f]
/opt/fluent/lib/ruby/gems/3.2.0/gems/ffi-1.15.5/lib/ffi_c.so(0x7f8869758937) [0x7f8869758937]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa1fdb) [0x7f8871aa1fdb]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_vm_call0+0xcd) [0x7f8871ab250d]
/opt/fluent/lib/libruby.so.3.2(rb_method_call_with_block_kw+0x84) [0x7f88719eb254]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa27f4) [0x7f8871aa27f4]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa289f) [0x7f8871aa289f]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_funcallv+0x19c) [0x7f8871ab3c8c]
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io_ext.so(0x7f886c17362d) [0x7f886c17362d]
/opt/fluent/lib/ruby/gems/3.2.0/gems/cool.io-1.8.0/lib/cool.io_ext.so(0x7f886c1787c2) [0x7f886c1787c2]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa1fdb) [0x7f8871aa1fdb]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa5991) [0x7f8871aa5991]
/opt/fluent/lib/libruby.so.3.2(0x7f8871aa72ca) [0x7f8871aa72ca]
/opt/fluent/lib/libruby.so.3.2(rb_vm_exec+0xef) [0x7f8871abcf1f]
/opt/fluent/lib/libruby.so.3.2(rb_vm_invoke_proc+0x5f) [0x7f8871abf97f]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a7450a) [0x7f8871a7450a]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a74de6) [0x7f8871a74de6]
/opt/fluent/lib/libruby.so.3.2(0x7f8871a75a32) [0x7f8871a75a32]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871494ac3) [0x7f8871494ac3]
/lib/x86_64-linux-gnu/libc.so.6(0x7f8871526850) [0x7f8871526850]

config

# cat /etc/fluent/conf.d/etl.conf 
<source>
    @type systemd
    tag raw.etl
    path /var/log/journal
    matches [{ "_SYSTEMD_UNIT": "etl.service" }, { "_SYSTEMD_UNIT": "heartbeat.service" }]
    read_from_head true
    
    <storage>
       @type local
       path /var/log/fluentd-journald-kubelet-cursor.json
    </storage>

    <entry>
       fields_strip_underscores true
       fields_lowercase true
    </entry>
</source>

<match etl>
    @type cloudwatch_logs
    log_group_name "/timeout/etl/#{ENV['TREEHUGGER_STAGE'] || 'prod'}/app"
    log_stream_name "#{ENV['INSTANCE_ID']}"
    auto_create_stream true
    flush_interval 1s
</match>

<system>
  root_dir /var/log/fluent
</system>
root@ip-10-200-4-48:~# 
@redliu312
Copy link

met the same issue on a k8s cluster and trying to use
fluentd image: fluent/fluentd-kubernetes-daemonset:v1.16.3-debian-elasticsearch8-2.1
with systemd plugin

the error :

fluentd-sbpzp fluentd 2024-01-28 08:20:57 +0000 [trace]: #0 [es_kubelet] writing events into buffer instance=2360 metadata_size=1
fluentd-sbpzp fluentd /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325: [BUG] Segmentation fault at 0x0000000000000008
fluentd-sbpzp fluentd ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
fluentd-sbpzp fluentd
fluentd-sbpzp fluentd -- Control frame information -----------------------------------------------
fluentd-sbpzp fluentd c:0013 p:---- s:0064 e:000063 CFUNC  :free
fluentd-sbpzp fluentd c:0012 p:0012 s:0059 e:000058 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal.rb:325
fluentd-sbpzp fluentd c:0011 p:0042 s:0053 e:000052 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/systemd-journal-1.4.2/lib/systemd/journal/navigable.rb:13
fluentd-sbpzp fluentd c:0010 p:0018 s:0047 e:000044 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:151
fluentd-sbpzp fluentd c:0009 p:0013 s:0040 e:000039 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:144
fluentd-sbpzp fluentd c:0008 p:0032 s:0034 e:000033 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-systemd-1.0.5/lib/fluent/plugin/in_systemd.rb:121 [FINISH]
fluentd-sbpzp fluentd c:0007 p:---- s:0030 e:000029 IFUNC
fluentd-sbpzp fluentd c:0006 p:0012 s:0027 e:000026 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/timer.rb:80 [FINISH]
fluentd-sbpzp fluentd c:0005 p:---- s:0022 e:000021 CFUNC  :run_once
fluentd-sbpzp fluentd c:0004 p:0034 s:0017 e:000016 METHOD /fluentd/vendor/bundle/ruby/3.2.0/gems/cool.io-1.8.0/lib/cool.io/loop.rb:88
fluentd-sbpzp fluentd c:0003 p:0026 s:0012 e:000011 BLOCK  /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/event_loop.rb:93
fluentd-sbpzp fluentd c:0002 p:0050 s:0008 e:000007 BLOCK  /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/plugin_helper/thread.rb:78 [FINISH]
fluentd-sbpzp fluentd c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]
fluentd-sbpzp fluentd
fluentd-sbpzp fluentd -- Ruby level backtrace information ----------------------------------------

@dkulchinsky
Copy link

looks like related to ledbettj/systemd-journal#93

and also reported here: fluent/fluent-package-builder#369

@fpiriz-inditex
Copy link

fpiriz-inditex commented Mar 1, 2024

Workaround suggested in fluent/fluent-package-builder#369 does remove the crashes. However, in my case (same as redliu312 in the previous message) systemd logs are not read with the workaround. Leaving the env variable untouched, fluentd crashes and upon restart, all the systemd logs are flushed and indexed.

@daipom
Copy link
Contributor

daipom commented Mar 25, 2024

Currently, we need to avoid this problem by unsetting LD_PRELOAD in some environments.

It is probably not a problem with this plugin, but it seems to be a problem with libjemalloc and systemd-journal.

It's a problem to have a segmentation fault in the default state of some Fluentd distributions, so we need to consider possible solutions.

kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
daipom pushed a commit to fluent/fluentd-kubernetes-daemonset that referenced this issue Jul 18, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
@ledbettj
Copy link

Hi, this issue is discussed over at ledbettj/systemd-journal#96

The problem is that the native library, libsystemd allocates memory using the system allocator, and then expects the calling code to free it. In most libraries, when they allocate memory they have a corresponding _free() method which can be called to release it, but libsystemd does not in this case.

The caller must call the corresponding free() implementation that was used for malloc() -- libc, tcmalloc, jemalloc, etc -- otherwise the process will likely crash.

In Ruby/FFI land, I don't know any good way to determine which allocator was used by the native code. Trying to load the jemalloc library when the libC implementation is already in use will cause issues :(

One option might be to provide a separate native library shim that just exposes a free wrapper (which would hopefully pick up on the LD_PRELOAD) but I don't know if that will be foolproof either.

@ledbettj
Copy link

Hi, this issue is discussed over at ledbettj/systemd-journal#96

The problem is that the native library, libsystemd allocates memory using the system allocator, and then expects the calling code to free it. In most libraries, when they allocate memory they have a corresponding _free() method which can be called to release it, but libsystemd does not in this case.

The caller must call the corresponding free() implementation that was used for malloc() -- libc, tcmalloc, jemalloc, etc -- otherwise the process will likely crash.

In Ruby/FFI land, I don't know any good way to determine which allocator was used by the native code. Trying to load the jemalloc library when the libC implementation is already in use will cause issues :(

One option might be to provide a separate native library shim that just exposes a free wrapper (which would hopefully pick up on the LD_PRELOAD) but I don't know if that will be foolproof either.

Hi friends,

I've published a version 2.0.0 of systemd-journal gem which includes a native shim as described above to attempt to work around this allocator mismatch.

Please give it a try and let me know if you run into any issues.

@kenhys
Copy link
Contributor

kenhys commented Sep 11, 2024

It seems that this issue was fixed in systemd-journal 2.0.0, so we shipped fluent-plugin-systemd 1.1.0 which adopts systemd-journal 2.0.0 or later.

Please use fluent-plugin-systemd 1.1.0.

@kenhys kenhys closed this as completed Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants