Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update allow_older_versions when running under Elastic Agent #34964

Merged
merged 8 commits into from
Mar 30, 2023

Conversation

rdner
Copy link
Member

@rdner rdner commented Mar 29, 2023

What does this PR do?

When Beats are running under Elastic Agent their initial output configuration is empty. Only a few moments later the output configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection callback to validate the Elasticsearch version. Unfortunately, this callback didn't account for the later allow_older_versions update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory output configuration on each control protocol update.

The flag is extracted in a separate struct field for quicker access without a need to parse the configuration again.

Why is it important?

Our customers are not able to use Elastic Agent with a previous minor version of Elasticsearch, although they might be absolutely compatible.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. elastic-package stack up - this ran [email protected] for me
  2. Build the most recent Elastic Agent (the minor version should be higher than elasticsearch)
DEV=true SNAPSHOT=true PLATFORMS="<your platform>" mage -v package
  1. Prepare this policy file policy.yml:
outputs:
  default:
    type: elasticsearch
    log_level: debug
    enabled: true
    hosts: ["https://127.0.0.1:9200"]
    username: "elastic"
    password: "changeme"
    allow_older_versions: true
    ssl:
      verification_mode: none

inputs:
  - type: system/metrics
    id: unique-system-metrics-input
    data_stream.namespace: default
    use_output: default
    streams:
      - metricset: cpu
        data_stream.dataset: system.cpu
      - metricset: memory
        data_stream.dataset: system.memory
      - metricset: network
        data_stream.dataset: system.network
      - metricset: filesystem
        data_stream.dataset: system.filesystem
  1. Run Elastic Agent in the standalone mode using this policy file:
sudo -- ./elastic-agent -e  run -c ./policy.yml 2> output-agent.ndjson
  1. Before the fix you would see these errors in output-agent.ndjson despite the allow_older_versions: true in the policy:
{
  "log.level": "error",
  "@timestamp": "2023-03-29T18:41:07.937+0200",
  "message": "Failed to connect to backoff(elasticsearch(https://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: Elasticsearch is too old. Please upgrade the instance. If you would like to connect to older instances set output.elasticsearch.allow_older_versions to true. ES=8.6.1, Beat=8.8.0",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-monitoring"
  },
  "log.logger": "publisher_pipeline_output",
  "log.origin": {
    "file.line": 150,
    "file.name": "pipeline/client_worker.go"
  },
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

After the fix, there are no more errors and you can see events coming in Kibana Discover.

  1. I also removed allow_older_versions: true from the policy and ran the agent to verify that I still see the expected version validation errors when the flag is not set.

  2. I tried to run Filebeat in a standalone mode in this configuration:

filebeat.inputs:
  - type: filestream
    id: my-filestream-id
    enabled: true
    paths:
      - "/logs/log*.log"
path.data: "/data"
logging:
  level: debug
output:
  elasticsearch:
    type: elasticsearch
    allow_older_versions: false
    hosts:
    - https://127.0.0.1:9200
    username: elastic
    password: changeme
    ssl:
      verification_mode: none

(don't forget to change the paths and write some lines to the matching log files).

And I can still see the expected error message when allow_older_versions: false:

{
  "log.level": "error",
  "@timestamp": "2023-03-29T21:21:13.492+0200",
  "log.logger": "publisher_pipeline_output",
  "log.origin": {
    "file.name": "pipeline/client_worker.go",
    "file.line": 150
  },
  "message": "Failed to connect to backoff(elasticsearch(https://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: Elasticsearch is too old. Please upgrade the instance. If you would like to connect to older instances set output.elasticsearch.allow_older_versions to true. ES=8.6.1, Beat=8.8.0",
  "service.name": "filebeat",
  "ecs.version": "1.6.0"
}

And no error message when allow_older_versions: true. Also, I see data coming in Kibana Discover on the filebeat-* index pattern.

Related issues

When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.

The flag is extracted in a separate struct field for quicker access
without a need to parse the configuration again.
@rdner rdner added bug Filebeat Filebeat backport-v8.7.0 Automated backport with mergify labels Mar 29, 2023
@rdner rdner self-assigned this Mar 29, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 29, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 29, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-30T16:39:45.891+0000

  • Duration: 66 min 23 sec

Test stats 🧪

Test Results
Failed 0
Passed 26094
Skipped 1969
Total 28063

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@rdner
Copy link
Member Author

rdner commented Mar 29, 2023

The linter failure is a false-positive.

@rdner rdner marked this pull request as ready for review March 29, 2023 19:30
@rdner rdner requested a review from a team as a code owner March 29, 2023 19:30
@rdner rdner requested review from cmacknz and leehinman and removed request for a team March 29, 2023 19:30
@rdner rdner added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Mar 29, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 29, 2023
@jlind23 jlind23 requested review from faec and belimawr March 30, 2023 07:02
@ycombinator
Copy link
Contributor

ycombinator commented Mar 30, 2023

@rdner As I'm trying to test this PR I'm seeing the following errors in output-agent.ndjson:

{"log.level":"error","@timestamp":"2023-03-30T08:37:07.850-0700","message":"Exiting: failed to get host information: 1 error: could not get FQDN, all methods failed: failed looking up CNAME: lookup cowchicken: no such host: failed looking up IP: lookup cowchicken: no such host","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-03-30T08:37:07.850-0700","message":"Exiting: failed to get host information: 1 erro
r: could not get FQDN, all methods failed: failed looking up CNAME: lookup cowchicken: no such host: failed looking up IP: lookup cowchicken: no such host","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-03-30T08:37:07.850-0700","message":"Exiting: failed to get host information: 1 error: could not get FQDN, all methods failed: failed looking up CNAME: lookup cowchicken: no such host: failed looking up IP: lookup cowchicken: no such host","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"ecs.version":"1.6.0"}

As such, none of the component Beats are starting up.

I am very familiar with these errors 😅; they were fixed very recently via #34946. If you rebase this PR on main, that should make these errors go away.

I'm working around this error locally but just mentioning the fix here for anyone else who might run into it while testing this PR.

@rdner
Copy link
Member Author

rdner commented Mar 30, 2023

@ycombinator hmm, I've tested with Beats on this branch, your agent must be newer, so the behaviour is different.

I don't feel like merging main and waiting for the CI all over again :)
Can you workaround the error and still test it?

@ycombinator
Copy link
Contributor

hmm, I've tested with Beats on this branch, your agent must be newer, so the behaviour is different.

The error will only happen if your hostname is set to something that cannot be resolved via DNS. Mine is deliberately set that way 🙂. So it's likely no one else is going to run into this issue. Just wanted to mention it in case someone else does.

I don't feel like merging main and waiting for the CI all over again :) Can you workaround the error and still test it?

Yup, understandable! I'm working around it locally and testing.

@ycombinator
Copy link
Contributor

Code LGTM (and thanks for the tests). I'm having some issues testing this locally but that might just be due to my local setup. Working through it...

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Tested with instructions in PR and I can get the Beats under Agent (8.8.0-SNAPSHOT) to connect to an older ES (8.6.1):

{"log.level":"info","@timestamp":"2023-03-30T10:25:58.007-0700","message":"Attempting to connect to Elasticsearch version 8.6.1","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"log.logger":"esclientleg","log.origin":{"file.line":291,"file.name":"eslegclient/connection.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-03-30T10:25:58.080-0700","message":"Attempting to connect to Elasticsearch version 8.6.1","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"log.origin":{"file.line":291,"file.name":"eslegclient/connection.go"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"esclientleg","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-03-30T10:25:58.196-0700","message":"Connection to backoff(elasticsearch(https://test-older-versions.es.us-central1.gcp.qa.cld.elstc.co:9243)) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"service.name":"metricbeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-03-30T10:25:58.267-0700","message":"Connection to backoff(elasticsearch(https://test-older-versions.es.us-central1.gcp.qa.cld.elstc.co:9243)) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Also tested with allow_older_versions line removed from config and the connection to ES fails, as expected:

{"log.level":"error","@timestamp":"2023-03-30T10:30:19.289-0700","message":"Failed to connect to backoff(elasticsearch(https://test-older-versions.es.us-central1.gcp.qa.cld.elstc.co:9243)): Connection marked as failed because the onConnect callback failed: Elasticsearch is too old. Please upgrade the instance. If you would like to connect to older instances set output.elasticsearch.allow_older_versions to true. ES=8.6.1, Beat=8.8.0","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"system/metrics-default","type":"system/metrics"},"log":{"source":"system/metrics-default"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":150,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

@rdner rdner merged commit 1a9d627 into elastic:main Mar 30, 2023
@rdner rdner deleted the fix-allow-older-versions branch March 30, 2023 18:44
mergify bot pushed a commit that referenced this pull request Mar 30, 2023
When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.

(cherry picked from commit 1a9d627)
rdner added a commit that referenced this pull request Mar 30, 2023
…er Elastic Agent (#34979)

* Update `allow_older_versions` when running under Elastic Agent (#34964)

When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.

(cherry picked from commit 1a9d627)

---------

Co-authored-by: Denis <[email protected]>
@cmacknz cmacknz added the backport-v8.6.0 Automated backport with mergify label May 25, 2023
mergify bot pushed a commit that referenced this pull request May 25, 2023
When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.

(cherry picked from commit 1a9d627)

# Conflicts:
#	libbeat/cmd/instance/beat_test.go
cmacknz added a commit that referenced this pull request May 25, 2023
…er Elastic Agent (#35574)

* Update `allow_older_versions` when running under Elastic Agent (#34964)

When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.

(cherry picked from commit 1a9d627)

# Conflicts:
#	libbeat/cmd/instance/beat_test.go

* Remove duplicate changelog entry

* Fix conflict in beat_test.go.

---------

Co-authored-by: Denis <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
When Beats are running under Elastic Agent their initial output
configuration is empty. Only a few moments later the output
configuration arrives as an update via the control protocol.

On startup Beats register a global Elasticsearch connection
callback to validate the Elasticsearch version. Unfortunately,
this callback didn't account for the later `allow_older_versions`
update via the control protocol and the updated value was not used.

This fixes that broken behaviour and makes an update to the entire in-memory
output configuration on each control protocol update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.6.0 Automated backport with mergify backport-v8.7.0 Automated backport with mergify bug Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beats started by agent do not respect the allow_older_versions: true configuration flag
4 participants