Allow queue configuration to be specified under the output type #35615

faec · 2023-05-30T17:53:04Z

Beats queue configurations are specified at the top level of the config, grouped by queue type, e.g. queue.mem or queue.disk. Recent work to support the shipper involved exposing queue configuration hooks to the output itself during initialization. We should follow up on this by moving the queue configuration entirely into the output block, for example:

output.elasticsearch:
  hosts: ["https://localhost:9200"]
  queue.mem:
    events: 5000

or

output.logstash:
  hosts: ["127.0.0.1:5044"]
  queue.disk:
    max_size: 10GB

This settings block, when present, should use the same structure and behavior as the existing Beats queue settings, and should replace the root-level configuration (though we will still support root-level as a fallback).

Outputs that support this will automatically gain the ability to specify queue configurations through Agent, both with and without the shipper enabled. (However without the shipper it will apply these settings separately in each input process, duplicating the queue for each input type, so we should be careful how we communicate this option.)

In addition, please add the IdleConnectionTimeout setting for the Elasticsearch Idle Timeout to the ES Output settings.

beats/libbeat/esleg/eslegclient/connection.go

Line 139 in ad64f28

httpcommon.WithKeepaliveSettings{IdleConnTimeout: s.IdleConnTimeout},

. This setting plus queue settings will allow users to adjust these settings for high scale.

Implement Queue Settings for Outputs
Allow configuring IdleConnectionTimeout setting for Elasticsearch output

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-05-30T17:53:06Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

andrewkroh · 2023-06-01T12:23:59Z

Relates: elastic/elastic-agent#284

cmacknz · 2023-06-08T20:45:27Z

Inability to configure the underlying Beat queue parameters is currently one of the largest blockers for scaling the agent. High volume use cases need a larger queue to achieve the required throughput and we prevent this right now.

Implementing this will unblock those use cases, although it will result is some potentially unintuitive behavior until we have the shipper as the total queue size configured will not be what is in the output configuration but will instead be proportional to the number of running Beat inputs.

cmacknz · 2023-06-13T17:48:01Z

Some relevant comments on elastic/kibana#158699 (comment)

When we do this, it will allow agent to configure both the memory and disk queue. I think we want to explicitly prevent configuring the agent with a disk queue until we support it properly, for example by correctly sharing the the queue contents across upgrades by default.

faec · 2023-06-13T20:15:32Z

I think we want to explicitly prevent configuring the agent with a disk queue until we support it properly, for example by correctly sharing the the queue contents across upgrades by default.

The easiest way to do this is to just not allow the disk queue to be configured in output settings yet, otherwise we'll need the outputs to special-case their config parsing based on whether they're running under Agent, which ideally they shouldn't know or care about. This sounds like a reasonable limitation to me -- there's no reason for us to advertise this new way of configuring the queue (to Beats users, at least), it just lets Beats work with Agent's unit-based configs, so a lack of disk queue support is unlikely to bite anyone. Any concerns about this approach?

cmacknz · 2023-06-13T20:57:20Z

Any concerns about this approach?

No, the simple approach is best here.

zez3 · 2023-08-28T09:40:30Z

So the Fleet YAML syntax would be something like?

worker:24
queue.disk.max_size:150GB

or?

cmacknz · 2023-08-28T14:46:42Z

Likely yes, the syntax will be confirmed once we have finished the implementation.

zez3 · 2023-09-01T07:09:18Z

Hmm, something seems to be off

  d79fc350-8546-11ed-b830-93d76b562dd8:
    "0": w
    "1": o
    "2": r
    "3": k
    "4": e
    "5": r
    "6": ':'
    "7": "2"
    "8": "4"
    "9": ' '
    "10": q
    "11": u
    "12": e
    "13": u
    "14": e
    "15": .
    "16": d
    "17": i
    "18": s
    "19": k
    "20": .
    "21": m
    "22": a
    "23": x
    "24": _
    "25": s
    "26": i
    "27": z
    "28": e
    "29": ':'
    "30": "3"
    "31": "0"
    "32": "0"
    "33": G
    "34": B
    api_key: mykey
    hosts:
    - https://mydom.mytld:9243
    type: elasticsearch
  default:
    api_key: somekey
    hosts:
    - https://mydom.mytld:9243
    type: elasticsearch

zez3 · 2023-09-04T06:45:41Z

got it working with the correct yaml syntax.
Strange is that Fleet accepts the incorrect values even after performing some validation check

pierrehilbert · 2023-09-07T06:53:35Z

@leehinman as discussed, I assigned you this issue for the next sprint.

strawgate · 2023-09-08T23:42:25Z

Can we ensure that flush.timeout is configurable for the queue with this work?

cmacknz · 2023-09-08T23:53:19Z

This would allow configuring every parameter of the Beats memory queue through an agent policy: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html#configuration-internal-queue-memory

cmacknz · 2023-09-08T23:55:18Z

That includes both flush.timeout and flush.min_events.

This would technically also allow configuring the Beats disk queue, but we will likely disallow it initially since this would create one disk queue per unique input type with the current architecture in the agent policy which is probably not what most people would expect.

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

cmacknz · 2023-09-28T18:24:32Z

When we merge this let's add an agent changelog entry as well explaining where this can be set.

cmacknz · 2023-09-28T18:35:37Z

This change should probably have a docs issue associated with it to make sure that this functionality is explained properly in the agent documentation.

I don't know that we need to document this for standalone Beats because it doesn't enable anything that wasn't already possible.

nimarezainia · 2023-09-29T03:41:53Z

This change should probably have a docs issue associated with it to make sure that this functionality is explained properly in the agent documentation.

I don't know that we need to document this for standalone Beats because it doesn't enable anything that wasn't already possible.

@kilfoyle fyi. the configuration items are described in this section already: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html we are just exposing them in Agent output (via the adavanced yaml box.

the possible settings are now:

output.elasticsearch: (any output)
hosts: ["https://localhost:9200"]
queue.mem:
events: 5000
flush.min_events: 512
flush.timeout: 5s

nimarezainia · 2023-09-29T03:44:34Z

This would technically also allow configuring the Beats disk queue, but we will likely disallow it initially since this would create one disk queue per unique input type with the current architecture in the agent policy which is probably not what most people would expect.

@cmacknz if we do this, we can claim disk queue support also. Fair enough that this queue is in every beat and not what user expects but then again so is the internal queue (until this change). The main usage for the spooling is resiliency when we are disconnected, is there a reason why we can't do spooling on the input rather than once on output?

My vote is to enable this while we are at it.

zez3 · 2023-09-29T06:51:51Z

@cmacknz

What do you mean by:

by correctly sharing the the queue contents across upgrades by default.

?

The Agent is quite offen upgraded most of times together with the whole stack. Can you please better describe the issue with the queue contents after upgrade?

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

kilfoyle · 2023-09-29T14:23:42Z

This change should probably have a docs issue associated with it to make sure that this functionality is explained properly in the agent documentation.

Thanks Craig and Nima for the heads up! I'll look after this docs issue in the upcoming sprint.

cmacknz · 2023-09-29T15:01:21Z

The Agent is quite offen upgraded most of times together with the whole stack. Can you please better describe the issue with the queue contents after upgrade?

See elastic/elastic-agent#3490 which explains what needs to happen for the Elastic Agent to support the Beats disk queue. This requires some understanding of the internal architecture of the agent. See https://github.com/elastic/elastic-agent/blob/main/docs/architecture.md.

The disk queue will be preserved between upgrades, but we need to do it correctly and without having to copy it since it can be quite large. We need to special case this in the agent to make sure it happens properly.

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

blakerouse · 2023-10-02T21:04:21Z

The Agent is quite offen upgraded most of times together with the whole stack. Can you please better describe the issue with the queue contents after upgrade?

See elastic/elastic-agent#3490 which explains what needs to happen for the Elastic Agent to support the Beats disk queue. This requires some understanding of the internal architecture of the agent. See https://github.com/elastic/elastic-agent/blob/main/docs/architecture.md.

The disk queue will be preserved between upgrades, but we need to do it correctly and without having to copy it since it can be quite large. We need to special case this in the agent to make sure it happens properly.

Do we really not want to copy it? What happens when its corrupted on upgrade and now rollback fails because we didn't copy it and the new version corrupted?

cmacknz · 2023-10-03T14:55:36Z

Do we really not want to copy it? What happens when its corrupted on upgrade and now rollback fails because we didn't copy it and the new version corrupted?

The primary reason is that it can be GBs in size, the default size is 10 GB.

I don't want us to block the completion of upgrades on copying a possibly 10 GB file, copying a file that large is also likely to run into disk space constraints since the system needs to be able to store it twice.

We need to handle the corrupted disk queue case, but I don't think we can solve it by copying on upgrade even though it is conceptually the simplest option I don't think it is practical.

zez3 · 2023-10-03T16:26:27Z

copying a possibly 10 GB file

FYI: When I was using graylog we had 300GB journal files fur queue buffering. This was more of a safety net for those moments when we had really high (>100k EPS) load of when we had to perform administration tasks that brought our ElasticSearch cluster offline. When the 300GB was reaching its full(>90%) capacity we where declaring to our LB that node dead.

My goal is to get the queue status from the Agent status(when the shipper will be finalized) or directly from the underlying filebeat via http endpoint

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

faec added enhancement Team:Elastic-Agent Label for the Agent team labels May 30, 2023

faec mentioned this issue May 30, 2023

[Meta] Elastic Agent Shipper Project elastic/elastic-agent-shipper#16

Open

100 tasks

faec mentioned this issue Jun 2, 2023

[Fleet] Revise generated policies when using the shipper elastic/kibana#158937

Closed

jlind23 assigned faec Jun 13, 2023

cmacknz mentioned this issue Jun 13, 2023

[Fleet] Add agent policy API to add settings not yet supported by UI elastic/kibana#158699

Closed

3 tasks

pierrehilbert assigned leehinman and unassigned faec Sep 7, 2023

cmacknz mentioned this issue Sep 14, 2023

Allow configuration of Agent(+)Beats Internal queue (on disk queue) elastic/elastic-agent#284

Closed

leehinman mentioned this issue Sep 19, 2023

add idle_connection_timeout to HTTPTransportSettings elastic/elastic-agent-libs#151

Merged

2 tasks

leehinman added a commit to leehinman/beats that referenced this issue Sep 27, 2023

add support for queue settings under outputs

f1c47f4

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman added a commit to leehinman/beats that referenced this issue Sep 27, 2023

add support for queue settings under outputs

26f8365

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman mentioned this issue Sep 27, 2023

add support for queue settings under outputs #36693

Closed

6 tasks

leehinman added a commit to leehinman/beats that referenced this issue Sep 27, 2023

add support for queue settings under outputs

d219432

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman added a commit to leehinman/beats that referenced this issue Sep 29, 2023

add support for queue settings under outputs

133ed85

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

kilfoyle mentioned this issue Sep 29, 2023

Document queue configuration as part of Elastic Agent output settings elastic/ingest-docs#526

Closed

leehinman added a commit to leehinman/beats that referenced this issue Sep 29, 2023

add support for queue settings under outputs

c3b29ba

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

cmacknz mentioned this issue Oct 3, 2023

Support the Beats disk queue in Elastic Agent elastic/elastic-agent#3490

Open

leehinman added a commit to leehinman/beats that referenced this issue Oct 6, 2023

add support for queue settings under outputs

50029da

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman mentioned this issue Oct 6, 2023

add support for queue settings under outputs #36788

Merged

6 tasks

leehinman added a commit to leehinman/beats that referenced this issue Oct 9, 2023

add support for queue settings under outputs

e5f2cea

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman mentioned this issue Oct 13, 2023

Add support for idle_connection_timeout to elasticsearch output #36843

Merged

5 tasks

leehinman added a commit to leehinman/beats that referenced this issue Oct 19, 2023

add support for queue settings under outputs

95ded73

- add support for `idle_connection_timeout` for ES output - add support for queue settings under output Closes elastic#35615

leehinman closed this as completed in #36788 Oct 19, 2023

kruskall mentioned this issue Oct 23, 2023

[updatecli] Update to elastic/beats@bbf0111c1d50 elastic/apm-server#11912

Merged

zez3 mentioned this issue Jul 11, 2024

Throttling Beats for system stability #17775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow queue configuration to be specified under the output type #35615

Allow queue configuration to be specified under the output type #35615

faec commented May 30, 2023 •

edited by strawgate

Loading

elasticmachine commented May 30, 2023

andrewkroh commented Jun 1, 2023

cmacknz commented Jun 8, 2023

cmacknz commented Jun 13, 2023

faec commented Jun 13, 2023

cmacknz commented Jun 13, 2023

zez3 commented Aug 28, 2023

cmacknz commented Aug 28, 2023

zez3 commented Sep 1, 2023

zez3 commented Sep 4, 2023

pierrehilbert commented Sep 7, 2023

strawgate commented Sep 8, 2023

cmacknz commented Sep 8, 2023 •

edited

Loading

cmacknz commented Sep 8, 2023 •

edited

Loading

cmacknz commented Sep 28, 2023

cmacknz commented Sep 28, 2023

nimarezainia commented Sep 29, 2023

nimarezainia commented Sep 29, 2023

zez3 commented Sep 29, 2023

kilfoyle commented Sep 29, 2023

cmacknz commented Sep 29, 2023

blakerouse commented Oct 2, 2023

cmacknz commented Oct 3, 2023

zez3 commented Oct 3, 2023 •

edited

Loading

Allow queue configuration to be specified under the output type #35615

Allow queue configuration to be specified under the output type #35615

Comments

faec commented May 30, 2023 • edited by strawgate Loading

elasticmachine commented May 30, 2023

andrewkroh commented Jun 1, 2023

cmacknz commented Jun 8, 2023

cmacknz commented Jun 13, 2023

faec commented Jun 13, 2023

cmacknz commented Jun 13, 2023

zez3 commented Aug 28, 2023

cmacknz commented Aug 28, 2023

zez3 commented Sep 1, 2023

zez3 commented Sep 4, 2023

pierrehilbert commented Sep 7, 2023

strawgate commented Sep 8, 2023

cmacknz commented Sep 8, 2023 • edited Loading

cmacknz commented Sep 8, 2023 • edited Loading

cmacknz commented Sep 28, 2023

cmacknz commented Sep 28, 2023

nimarezainia commented Sep 29, 2023

nimarezainia commented Sep 29, 2023

zez3 commented Sep 29, 2023

kilfoyle commented Sep 29, 2023

cmacknz commented Sep 29, 2023

blakerouse commented Oct 2, 2023

cmacknz commented Oct 3, 2023

zez3 commented Oct 3, 2023 • edited Loading

faec commented May 30, 2023 •

edited by strawgate

Loading

cmacknz commented Sep 8, 2023 •

edited

Loading

cmacknz commented Sep 8, 2023 •

edited

Loading

zez3 commented Oct 3, 2023 •

edited

Loading