Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the fileProspector's ignoreInactiveSince value #34770

Merged
merged 9 commits into from
Mar 30, 2023
Merged

Conversation

Epakesko
Copy link
Contributor

@Epakesko Epakesko commented Mar 8, 2023

The filestream input's fileProspector does not take the configured ignore_inactive option into account. This option could be especially important if the registry is not persisted before Filebeat is restarted. In this case, setting ignore_inactive to since_last_start should cause Filebeat to only read the changes made since the restart and ignore old files, but since this value is not actually set in the fileProspector, every file will be fully read again.

What does this PR do?

Sets the ignoreInactiveSince option on the fileProspector, based on the ignore_inactive configuration option.

Why is it important?

Because the ignore_inactive option is currently ignored.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Start filebeat with the ignore_inactive option set to since_last_start. Example:

filebeat.inputs:
- type: filestream
  id: json-log-files
  ignore_inactive: since_last_start
  paths:
    - ./data/logs/*.json
  parsers:
    - ndjson:
        message_key: message
      
output.file:
  path: ./data/filebeat_output

Add your log files. The contents of the log files are added to the output.
Stop filebeat, and remove the registry.
Start filebeat, and add new content to the log files.
Expected: Only the newly added content is added to the output.
Result without the fix: The full content of the log files was added to the output.
Result with the fix: Only the newly added content is added to the output.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 8, 2023
@mergify
Copy link
Contributor

mergify bot commented Mar 8, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @Epakesko? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-30T07:55:19.095+0000

  • Duration: 68 min 27 sec

Test stats 🧪

Test Results
Failed 0
Passed 7601
Skipped 747
Total 8348

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@Epakesko Epakesko marked this pull request as ready for review March 8, 2023 14:50
@Epakesko Epakesko requested a review from a team as a code owner March 8, 2023 14:50
@Epakesko Epakesko requested review from ycombinator and faec and removed request for a team March 8, 2023 14:50
@cmacknz cmacknz requested review from rdner and belimawr March 8, 2023 14:59
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Mar 8, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 8, 2023
@belimawr
Copy link
Contributor

@Epakesko thanks for catching this and submitting the fix!

Could you add at least a simple test that shows your fix works? It could be as simple as calling newProspector and validating that the returned prospector contains the value correctly set.

@belimawr belimawr self-assigned this Mar 22, 2023
@Epakesko
Copy link
Contributor Author

Hi belimawr

Sure, I've added a new test file called prospector_creator_test.go which contains a test with 3 cases: checking if the option is not set, set to since_first_start, and set to since_last_start. If the fix is applied the test passes, if it's not, it fails as the value of ignoreInactiveSince on the fileProspector will always be 0.

@rdner rdner merged commit 837c796 into elastic:main Mar 30, 2023
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
Set the fileProspector's ignoreInactiveSince value based on the ignore_active config option

---------

Co-authored-by: Denis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants