Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

free(): invalid pointer with latest fluent/fluentd-kubernetes-daemonset:v1-debian-forward-arm64 image #1478

Closed
smparekh opened this issue Jan 17, 2024 · 7 comments
Labels

Comments

@smparekh
Copy link

Describe the bug

Using the latest v1-debian-forward-arm64 image results in the container throwing free(): invalid pointer and constantly restarting leading to a node eviction

To Reproduce

I have provided a redacted config to reproduce

Expected behavior

Worker should comeup and stay up

Your Environment

- Tag of using fluentd-kubernetes-daemonset:v1-debian-forward-arm64

Your Configuration

@include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"
    @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"
    @include conf.d/*.

    <label @FLUENT_LOG>
      <match fluent.**>
        @type null
        @id ignore_fluent_logs
      </match>
    </label>

    <match kubelet>
      @type null
    </match>

    <filter kubernetes.**>
      @type kubernetes_metadata
      @id filter_kube_metadata
      kubernetes_url "#{ENV['FLUENT_FILTER_KUBERNETES_URL'] || 'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
      verify_ssl "#{ENV['KUBERNETES_VERIFY_SSL'] || true}"
      ca_file "#{ENV['KUBERNETES_CA_FILE']}"
      skip_labels "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_LABELS'] || 'false'}"
      skip_container_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_CONTAINER_METADATA'] || 'false'}"
      skip_master_url "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_MASTER_URL'] || 'false'}"
      skip_namespace_metadata "#{ENV['FLUENT_KUBERNETES_METADATA_SKIP_NAMESPACE_METADATA'] || 'false'}"
      watch "#{ENV['FLUENT_KUBERNETES_WATCH'] || 'true'}"
    </filter>

    <source>
      @type tail
      @id in_tail_container_logs
      path "#{ENV['FLUENT_CONTAINER_TAIL_PATH'] || '/var/log/containers/*.log'}"
      pos_file "#{File.join('/var/log/', ENV.fetch('FLUENT_POS_EXTRA_DIR', ''), 'fluentd-containers.log.pos')}"
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
      </parse>
    </source>

    <filter qfunctions.**>
      @type record_transformer
      enable_ruby true
      <record>
        message ${record["message"].gsub(/^.*std(out|err):\s/, '')}
      </record>
    </filter>

    <filter qfunctions.**>
      @type parser
      format json
      key_name message
      emit_invalid_record_to_error false
    </filter>

    <match qfunctions.**>
      @type rewrite_tag_filter
      <rule>
        key tenant_id
        pattern /^abc1234$/
        tag abc1234
      </rule>
      <rule>
        key tenant_id
        pattern /.+/
        tag clear
      </rule>
    </match>
    <match abc1234.**>
      @type http
      @id out_abc1234
      @log_level info
      
      endpoint "#{ENV['ENDPOINT']}"
      http_method post
      content_type application/json
      json_array true
      <format>
        @type json
      </format>
      headers {"X-P-Stream": "functions", "X-P-Meta-Org-Id": "abc1234"}
      <auth>
        method basic
        username "#{ENV['USERNAME']}"
        password "#{ENV['PASSWORD']}"
      </auth>
    </match>

    <match clear>
      @type null
    </match>


### Your Error Log

```shell
2024-01-17 15:48:14 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:15 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:15 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:15 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:15 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b91b8 @keys="tenant_id">, /^abc1234$/, "", "abc1234", nil]
2024-01-17 15:48:15 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff7b7b8790 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:15 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:15 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:15 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:15 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:15 +0000 [info]: adding source type="tail"
2024-01-17 15:48:15 +0000 [info]: #0 starting fluentd worker pid=361 ppid=6 worker=0
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:15 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
2024-01-17 15:48:17 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-01-17 15:48:18 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-01-17 15:48:18 +0000 [info]: adding match in @FLUENT_LOG pattern="fluent.**" type="null"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="kubelet" type="null"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="record_transformer"
2024-01-17 15:48:18 +0000 [info]: adding filter pattern="qfunctions.**" type="parser"
2024-01-17 15:48:18 +0000 [info]: adding match pattern="qfunctions.**" type="rewrite_tag_filter"
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd245b0 @keys="tenant_id">, /^org_2Jf4UxF6FEwCMecX$/, "", "abc1234", nil]
2024-01-17 15:48:18 +0000 [info]: #0 adding rewrite_tag_filter rule: tenant_id [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x0000ffff8cd23f98 @keys="tenant_id">, /.+/, "", "clear", nil]
2024-01-17 15:48:18 +0000 [info]: adding match pattern="abc1234.**" type="http"
2024-01-17 15:48:18 +0000 [warn]: #0 [out_abc1234] Status code 503 is going to be removed from default `retryable_response_codes` from fluentd v2. Please add it by yourself if you wish
2024-01-17 15:48:18 +0000 [info]: adding match pattern="clear" type="null"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="systemd"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus"
2024-01-17 15:48:18 +0000 [info]: adding source type="prometheus_output_monitor"
2024-01-17 15:48:18 +0000 [info]: adding source type="tail"
2024-01-17 15:48:18 +0000 [info]: #0 starting fluentd worker pid=376 ppid=6 worker=0
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/contact-task-runtime-5cbd49696c-fmqkz_openfaas-fn_contact-task-runtime-90840620b3e6f1d26b85a666402b31aa3a5d5f9faf8f2388c919c87c5ce082a1.log
2024-01-17 15:48:18 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/ground-task-runtime-65446d7bcc-527dl_openfaas-fn_ground-task-runtime-b12db1d88da3a582965a7ff372367d9676e9e640f505694022c6f5da97649e46.log
2024-01-17 15:48:18 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer

Additional context

we have a daemonset in a cluster running from about 22d ago where we are not seeing the invalid pointer issue

@smparekh
Copy link
Author

the sha 256 digest we are having issue with: 59886dc179d52a43dfdf061c764e9856dafc67c41dd78e9d868872000d9e660a

@smparekh
Copy link
Author

reverting to this sha: f0c0d41aba562c5f4ce13f2b00ae50c381925063cfcc7ec7a9f2a4f622ee9535 doesn't throw invalid pointer

@StevenChangNoodoe
Copy link

I have the same issue in fluent/fluentd-kubernetes-daemonset:v1-debian-cloudwatch.
I revert to this sha: b7185b3483d2ca5c3e923e33641dd3814865321b34da05c46eda96576da905a0 doesn't throw this error too.
v1-debian-cloudwatch.log

@CAR6807
Copy link

CAR6807 commented Apr 4, 2024

Also seeing this in
fluent/fluentd-kubernetes-daemonset:v1.16.5-debian-forward-1.0 image

logging fails

2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/node-problem-detector-kwwk8_kube-system_node-problem-detector-4e2796e4c3ca14953fda355aca52c0200a0f53b7b0596d7e94ec89169c782f8a.log
2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/unbound-exporter-llm48_unbound_unbound-exporter-bd636614623be73dc03069f9a0fefffb779c47d2c034e796d3364fb49fb2e6fe.log
2024-04-03 20:27:34 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/unbound-exporter-llm48_unbound_unbound-exporter-init-1b88c92fa871c07c66d558a84a656879a1b13dfa12c6b533b37ec9ae74fc555f.log
2024-04-03 20:27:34 +0000 [info]: #0 fluentd worker is now running worker=0
free(): invalid pointer
2024-04-03 20:27:37 +0000 [error]: Worker 0 exited unexpectedly with signal SIGABRT
2024-04-03 20:27:37 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil

Copy link

github-actions bot commented Jul 4, 2024

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jul 4, 2024
@daipom daipom removed the stale label Jul 4, 2024
@kenhys
Copy link
Contributor

kenhys commented Jul 17, 2024

NOTE:

kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 17, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
fluent#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
daipom pushed a commit that referenced this issue Jul 18, 2024
There is a known bug that combination with jemalloc and
fluent-plugin-systemd causes free(): invalid crash for
a long time. The problematic code is identified but the
root cause is not fixed yet.

There is a workaround for this - disable jemalloc explicitly.
LD_PRELOAD= stop to use jemalloc.

If you want to use jemalloc, set it via env like this:

 containers:
   - name: fluentd
     image: fluent/fluentd-kubernetes-daemonset:v1-debian-forward
     env:
       - name: K8S_NODE_NAME
         valueFrom:
           fieldRef:
             fieldPath: spec.nodeName
       - name:  FLUENT_FORWARD_HOST
         value: "REMOTE_ENDPOINT"
       - name:  FLUENT_FORWARD_PORT
         value: "18080"
       - name:  LD_PRELOAD
         value: "/usr/lib/libjemalloc.so.2"

Related issues:

fluent/fluentd-docker-image#378
fluent/fluent-package-builder#369
fluent-plugins-nursery/fluent-plugin-systemd#110
ledbettj/systemd-journal#93
#1478

Signed-off-by: Kentaro Hayashi <[email protected]>
@kenhys
Copy link
Contributor

kenhys commented Jul 18, 2024

v1.17-debian-forward-1.3 or v1.16.5-debian-forward-1.3 will fix this issue.

@kenhys kenhys closed this as completed Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants