You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently discovered that the last log lines of a short-living container were not sent on shutdown.
Fluentbit with the SQS plugin are collecting the logs from a side-car container in an ECS task. When the main container exists, the SIGTERM signal is sent to Fluentbit which makes it stop after the Grace period.
Looking at the plugin code, when FLBPluginExit() is invoked, the pending records in SqsRecords are not sent to SQS queue. A simple status is returned but nothing else is done :
To describe the issue a bit differently, this output plugin is using its own buffering mechanism that may delay (a lot) sending some log records, waiting for BatchSize size to be reached.
At the moment, this plugin will send a SQS message batch when, and only when, there are BatchSize records to batch together. Which means if fluentbit flushes its memory buffer to send the corresponding in-flight records and if the records count isn't an exact multiple of BatchSize, there will be pending records in the plugin memory. When fluentbit will flush new records, those pending records will be part of the first batch sent (if any).
There are several concerns here :
if fluentbit stops or crashes while there are pending records in the plugin memory, they'll be lost
if the source is emitting logs very slowly, some logs may be sent very late; plugin could even wait indefinitely before sending them if BatchSize is never reached
fluentbit has many parameters to tweak buffering and backpressure, and the plugin logic should follow the same principles without introducing new ones (this is obviously largely opinionated 😄)
Doing some experiments around this issue, I came up with some code updates. PR is coming.
Hi there!
We recently discovered that the last log lines of a short-living container were not sent on shutdown.
Fluentbit with the SQS plugin are collecting the logs from a side-car container in an ECS task. When the main container exists, the
SIGTERM
signal is sent to Fluentbit which makes it stop after the Grace period.Looking at the plugin code, when
FLBPluginExit()
is invoked, the pending records in SqsRecords are not sent to SQS queue. A simple status is returned but nothing else is done :fluentBit-sqs-plugin/out_sqs.go
Line 243 in 704de47
Padding the logs with 10 dummy lines at the end is our current workaround. Some of those padding logs are never sent obvisouly.
The text was updated successfully, but these errors were encountered: