Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup data streams and send events to them #28450

Merged
merged 56 commits into from
Dec 6, 2021

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Oct 14, 2021

What does this PR do?

We are introducing data streams to Beats. It means that all Beats are going to send events to data streams instead of indices regardless of ES version. Do not confuse it with the data stream naming convention we use in integrations. Naming does not change in Beats, only the underlying data storage method in Elasticsearch.

The name of the data stream is going to be {beatname}-{version} and the index pattern is {beatname}-{version}.

With this change, the option setup.template.type no longer makes sense. Hence, it is removed completely from 8.x. If you are loading JSON index templates by specifying a file in setup.template.json.path, make sure you move from the legacy format to composable index templates.

Beats no longer load an alias to Elasticsearch, instead all data can be reached through the data stream.

One of the limitations is that only create operations are supported in data streams. Thus, there is no way to use e.g. "index" or "delete" operation types when sending events to ES.

Why is it important?

Simplify loading ILM, templates, and use the specialized data streams for output events.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Closes #25018
Requires #28671

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2021
@mergify
Copy link
Contributor

mergify bot commented Oct 14, 2021

This pull request does not have a backport label. Could you fix it @kvch? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Oct 14, 2021
@kvch kvch added backport-v7.16.0 Automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Oct 14, 2021
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 14, 2021
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Oct 14, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 14, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-12-06T09:03:18.547+0000

  • Duration: 134 min 19 sec

  • Commit: 977312f

Test stats 🧪

Test Results
Failed 0
Passed 48589
Skipped 4276
Total 52865

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@kvch kvch force-pushed the forward-compatibility-libeat-index-template branch from d1d3adc to 39b0ad1 Compare October 15, 2021 11:38
@kvch kvch marked this pull request as ready for review October 15, 2021 16:02
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@kvch kvch changed the title Create datastreams when setting up templates Create data streams when setting up templates Oct 18, 2021
@kvch kvch requested review from a team as code owners October 18, 2021 16:56
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but I think we should likely quickly sync on this before moving forward. The question from my side is, when do we switch to use data streams by default? Any Beat in 8.0 should use data streams by default and the option to use legacy should not exist anymore. In 7.16, it could be opt in. But in this case, all 7.16 Beats used have to be configured to write to data streams. We should not have part of the 7.16 data in data streams and some in indices as it complicates things. Any easy way out here is to keep in 7.16 the indices as the default and 8.0 goes to data streams automatically.

It seems we already had an option in place for component, index and legacy templates. What is the difference between index and component templates in these config options? And as we already had the option available, why are so many changes to the code needed?

libbeat/idxmgmt/std.go Outdated Show resolved Hide resolved
libbeat/template/config.go Outdated Show resolved Hide resolved
libbeat/template/load.go Outdated Show resolved Hide resolved
libbeat/template/load.go Outdated Show resolved Hide resolved
libbeat/template/load.go Outdated Show resolved Hide resolved
libbeat/template/load.go Outdated Show resolved Hide resolved
libbeat/template/load.go Outdated Show resolved Hide resolved
@kvch
Copy link
Contributor Author

kvch commented Oct 19, 2021

Left a few comments but I think we should likely quickly sync on this before moving forward. The question from my side is, when do we switch to use data streams by default?

I probably misunderstood you last time we discussed it. I thought we agreed that from 7.16, Beats should send events into data streams, but let users opt-out if they want to. Also, that is how I interpreted the following line in the original issue: "Moving to data streams already in 7.x will help improve robustness in resource setup and management."

Any Beat in 8.0 should use data streams by default and the option to use legacy should not exist anymore. In 7.16, it could be opt in. But in this case, all 7.16 Beats used have to be configured to write to data streams.

From testing point of view, it makes our life easier. But I would still move to data streams earlier because it bring more value to our users.

We should not have part of the 7.16 data in data streams and some in indices as it complicates things.

Agreed, that's what I had implemented. Or did I miss something?

It seems we already had an option in place for component, index and legacy templates. What is the difference between index and component templates in these config options?

Index template are regular templates as you expect. Component templates can be used in other index templates. I added the option to load Beats mapping/templates as component templates, so users can add more fields and create their own index templates by including beats component templates and their custom component templates. This option is for advanced users, who know how to manage mappings, what fields they need, etc.

And as we already had the option available, why are so many changes to the code needed?

The option data_stream has not been available in Beats. Only using the v2 indices.

I added a new datastreamManager that checks the options the user configured and reports errors. I also added a separate Setup to because, in the case of data streams, we have to setup things differently:

  • beats template is loaded as a component template, to let users add their own field if they want to
  • there is no need for creating an index alias in the case of data streams
  • a data stream has to be created

Furthermore, there are also some refactoring in loading to avoid copy-pasting the same code in template loading.

@kvch kvch removed the backport-v7.16.0 Automated backport with mergify label Oct 19, 2021
@mergify
Copy link
Contributor

mergify bot commented Oct 19, 2021

This pull request does not have a backport label. Could you fix it @kvch? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Oct 19, 2021
@kvch kvch marked this pull request as draft October 19, 2021 15:48
libbeat/idxmgmt/std.go Outdated Show resolved Hide resolved
@kvch kvch force-pushed the forward-compatibility-libeat-index-template branch from e2cbecb to 2d3d788 Compare October 26, 2021 08:10
kvch and others added 8 commits October 27, 2021 15:24
…lastic#28538)

## What does this PR do?

From now on Beats is going to load the new index templates to Elasticsearch 7.x and 8.x.

The PR consists of a few test fixes, and I had found an issue in templates, that I will port to master.

## Why is it important?

This way we can be forward compatible with Elasticsearch 8.x where legacy templates are no longer available.
@kvch
Copy link
Contributor Author

kvch commented Nov 29, 2021

jenkins run tests

@mergify
Copy link
Contributor

mergify bot commented Nov 30, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b forward-compatibility-libeat-index-template upstream/forward-compatibility-libeat-index-template
git merge upstream/master
git push upstream forward-compatibility-libeat-index-template

@kvch
Copy link
Contributor Author

kvch commented Dec 2, 2021

/test filebeat

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Lets get it in. We can still do a few follow up PRs with minor clenaup.

Awesome to see how much code and with it complexity this removed!

}

if templateComponent.load && !ilmComponent.load && ilmComponent.enabled {
return false, "Loading template with ILM settings whithout loading ILM " +
"policy and alias can lead to issues and is not recommended. " +
"policy can lead to issues and is not recommended. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up: This log message and the above, we likely could be more specific on which config should be checked exactly to make it easier for users to find it.

MapStr: common.MapStr(tmpl),
}
}
templates, _ := response.GetValue("index_templates")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the errors be checked just to be save?

@kvch kvch merged commit 405c342 into elastic:master Dec 6, 2021
mergify bot pushed a commit that referenced this pull request Dec 6, 2021
## What does this PR do?

  We are introducing data streams to Beats. It means that all Beats are going to send events to data streams instead of indices regardless of ES version. Do not confuse it with the data stream naming convention we use in integrations. Naming does not change in Beats, only the underlying data storage method in Elasticsearch.

  The name of the data stream is going to be `{beatname}-{version}` and the index pattern is `{beatname}-{version}`.

  With this change, the option `setup.template.type` no longer makes sense. Hence, it is removed completely from 8.x. If you are loading JSON index templates by specifying a file in `setup.template.json.path`, make sure you move from the legacy format to composable index templates.

  Beats no longer load an alias to Elasticsearch, instead all data can be reached through the data stream.

  One of the limitations is that only create operations are supported in data streams. Thus, there is no way to use e.g. "index" or "delete" operation types when sending events to ES.

  ## Why is it important?

  Simplify loading ILM, templates, and use the specialized data streams for output events.

(cherry picked from commit 405c342)
kvch added a commit that referenced this pull request Dec 6, 2021
## What does this PR do?

  We are introducing data streams to Beats. It means that all Beats are going to send events to data streams instead of indices regardless of ES version. Do not confuse it with the data stream naming convention we use in integrations. Naming does not change in Beats, only the underlying data storage method in Elasticsearch.

  The name of the data stream is going to be `{beatname}-{version}` and the index pattern is `{beatname}-{version}`.

  With this change, the option `setup.template.type` no longer makes sense. Hence, it is removed completely from 8.x. If you are loading JSON index templates by specifying a file in `setup.template.json.path`, make sure you move from the legacy format to composable index templates.

  Beats no longer load an alias to Elasticsearch, instead all data can be reached through the data stream.

  One of the limitations is that only create operations are supported in data streams. Thus, there is no way to use e.g. "index" or "delete" operation types when sending events to ES.

  ## Why is it important?

  Simplify loading ILM, templates, and use the specialized data streams for output events.

(cherry picked from commit 405c342)

Co-authored-by: Noémi Ványi <[email protected]>
@dikshachauhan-qasource
Copy link

Hi @kvch

We have attempted to perform filebeats installation on latest 8.0 snapshot build that is available after the merges on above ticket.

Build details:

BUILD 48594
COMMIT ad3660f3acbfe6eb809d869b908221edf2846313
artifact: https://snapshots.elastic.co/8.0.0-4535d287/downloads/beats/filebeat/filebeat-8.0.0-SNAPSHOT-linux-x86_64.tar.gz

Observations:
Scenario 1: Attempted to install filebeat on Linux OS with same steps as mentioned on Elastic guide with Module system enabled.

Result: Filebeat indices showed up on discover tab and NO data was available under data streams tab.

Scenario 2: Attempted to install filebeat on Linux OS with 'setup.ilm.enabled' property as true [ uncommented at filebeat.reference.yml file] with Module system enabled.

Result: Again Filebeat indices showed up on discover tab and NO data was available under data streams tab.

Scenario 3: When we ran up Filebeat set command at scenario1 and scenario2 we observed below messages.
image

So, we again updated 'setup.ilm.overwrite' property as true and re-ran the setup and filebeat -e commands.

However, again observation remained same.

Further, we wanted to know more about the messages that are shown up after running setup command.

It says Index setup finished. So even if we makes changes as mentioned in Scenario 2, and as per our understanding ILM policy should be effective and data should be displayed under data streams tab. However, expected results were not achievable.

So, Could you please let us know if we are missing anything.

Screenshots:
image

image

Thanks
QAS

@kvch
Copy link
Contributor Author

kvch commented Dec 7, 2021

Adoption of data streams in Beats has nothing to with data streams in fleet. The data streams here are just special indices for time-series data managed by Elasticsearch. Thus, you should not see data streams in Fleet, but events on the Discover tab. Based on your tests and screenshots, it seems to me that everything is working well.

@dikshachauhan-qasource
Copy link

Hi @kvch

Thanks for providing the feedback.

However, we are still confused about the implemented changes.

In Ticket summary, it is mentioned that:

The name of the data stream is going to be {beatname}-{version} and the index pattern is {beatname}-{version}.

Where we can validate this information. Please make us more clear on new changes that are effective with respect to proposed changes.

Thanks
QAS

@kvch
Copy link
Contributor Author

kvch commented Dec 8, 2021

I am sorry if I was not clear. The ideal way to check is not necessarily if the expected names show up. I am rather interested in if dashboards still work. For example you could enable system module, send a few logs and check if the system dashboards can display the data. Could you please test this for me?

@dikshachauhan-qasource
Copy link

Hi @kvch

We have retested filebeat installation on 8.0 snapshot and below are our observations:

Screenshots:
Discover tab:
image

Dashboards:
image

image

image

Please let us know if we are still missing anything.

Thanks
QAS

@kvch
Copy link
Contributor Author

kvch commented Dec 9, 2021

Awesome! This is what I wanted to see. Thanks.

Hythloday-zero added a commit to elastic/observability-docs that referenced this pull request Jun 29, 2022
As of Beats 8.0, data streams are the default output to Elasticsearch.  Line 245 needs change to {y} and the phrasing in Line 248 should reflect that all three options can take advantage of data streams, easier ILM, and the data stream naming scheme.

Reference: elastic/beats#28450 linked in https://www.elastic.co/guide/en/beats/libbeat/8.0/release-notes-8.0.0.html
dedemorton pushed a commit to elastic/observability-docs that referenced this pull request Jun 29, 2022
As of Beats 8.0, data streams are the default output to Elasticsearch.  Line 245 needs change to {y} and the phrasing in Line 248 should reflect that all three options can take advantage of data streams, easier ILM, and the data stream naming scheme.

Reference: elastic/beats#28450 linked in https://www.elastic.co/guide/en/beats/libbeat/8.0/release-notes-8.0.0.html
dedemorton pushed a commit to dedemorton/observability-docs that referenced this pull request Jun 29, 2022
As of Beats 8.0, data streams are the default output to Elasticsearch.  Line 245 needs change to {y} and the phrasing in Line 248 should reflect that all three options can take advantage of data streams, easier ILM, and the data stream naming scheme.

Reference: elastic/beats#28450 linked in https://www.elastic.co/guide/en/beats/libbeat/8.0/release-notes-8.0.0.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.0.0 Automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beats Index Template and Index Lifecycle forwards compatibility
5 participants