-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x-pack/filebeat/input/cel: new input #31233
Conversation
This pull request doesn't have a |
97dc2ff
to
682519e
Compare
Thanks. I'll take a look.
|
fc58cb1
to
1573167
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
d2c0551
to
b8c73a0
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
Prior to this change, running an instance of filebeat with the following configuration would result in an unstoppable instance. filebeat.inputs: - type: cel interval: 1m resource.url: https://api.ipify.org/?format=json program: | bytes(get(state.url).Body).as(body, { "events": [body.decode_json()] }) output.console.pretty: true This happens because the cel program evaluation method does not return the context cancellation error when a context is cancelled. We also don't check for cancellation except in the case that we have events or we have a limit policy in place, so add a check immediately after the return of the evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested with want_more
and things worked as expected. 👍
x-pack/filebeat/input/cel/input.go
Outdated
} | ||
|
||
// Process a set of event requests. | ||
log.Debugw("request state", "state", state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider namespacing the logger's structured data with "cel" (e.g. iirc .Debugw("message", logp.Namespace("cel"), "state", state)
). This way all of the structured data will get logged like {"cel": {"state: ...}}
. That could help minimize conflicts if someone was ingesting this data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not log := env.Logger.Named("cel").With("input_url", cfg.Resource.URL)
at line 104?
I've looked into this and this is already decorated by the input cursor caller here
beats/filebeat/input/v2/input-cursor/input.go
Lines 118 to 124 in 4e1a251
inpCtx := ctx | |
inpCtx.ID = ctx.ID + "::" + source.Name() | |
inpCtx.Logger = ctx.Logger.With("input_source", source.Name()) | |
if err = inp.runSource(inpCtx, inp.manager.store, source, pipeline); err != nil { | |
cancel() | |
} |
Is this still something that you would like done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace has a different purpose than the name. And what I’m suggesting is not strictly necessary and nor is it a general solution to the problem of using a consistent schema across log messages, but it might help make someone’s job easier.
Basically by adding logp.Namespace("cel")
it will be creating cel.state.*
fields in the resulting JSON from the logger as opposed to state.*
. In case something else is already logging state
, say as a string, then it will be easier to consume the log data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Is that something that should be added to httpjson as well? No, no it isn't, since state is never logged there.
Done
x-pack/filebeat/input/cel/input.go
Outdated
_, ok = state["url"] | ||
if !ok && goodURL != "" { | ||
state["url"] = goodURL | ||
log.Infow("adding missing url", "state", mapstr.M(state), "url", goodURL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to clarify how it's correcting the issue and explain why it's happening. I'm thinking something like
adding the current URL to the returned state because it did not include a 'url'
I expect some use cases that only hit a single static URL will omit adding the url
from their CEL expression. Should we make this debug level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, it feels like INFO, but in the case that people don't return a URL, it would spam the logs with this, though that would be easily fixed by following the documentation.
x-pack/filebeat/input/cel/input.go
Outdated
state, err = evalWith(ctx, prg, map[string]interface{}{ | ||
root: state, | ||
}) | ||
log.Debugw("response state", "state", state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would state
include anything sensitive that should not be logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a note to the documentation that when logging with debug that the complete state after evaluation is logged and so this should not be used in production. The situation is only slightly worse than the case for logging with debug and seeing all published events, but worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option; I think this would make a good enhancement, but not for now because I'd like to spend some more time thinking about the best way to do it, would be to have either a config.mask
or state.mask
array of string field that is used to remove/clobber fields in a copy of the state before logging state.
- note security concern with logging at debug level and make all state logging at this level. - add note in debug log explaining missing url.
The linter errors may be related to #33649. |
Stumbled on this through the linked linter PR: Cool! I am excited to see this implemented. I have used CEL in the past for configurable processing and have been wondering if we could make use of it. I am particularly interested to see if CEL stays contained to this one one input after we get some experience with it, or if we'll want to use it in more places. |
/test |
@cmacknz I can see value in having CEL processing available elsewhere and this is initially an experiment to see how well it will work as an analogue of the httpjson input. You may want to take a look at github.com/elastic/mito which provides the CEL extensions used here. |
E2E failure is unrelated. |
This adds a new filebeat input that enables processing a resource that is either filesystem-local or an HTTP API endpoint. The input uses the Common Expression Language to convert arbitrarily formatted data into a set of objects that are then published by filebeat. Documentation for CEL and the CEL extensions that are made available through the input is available from the CEL project's pages and an Elastic repository that extends the standard language's features: github.com/elasti/mito.
What does this PR do?
This adds a new input to filebeat that allows processing of datastreams using the Common Expression Language.
Why is it important?
It provides a consistent framework for generalised input processing.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs