-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(new sink): Initial syslog
sink
#7106
Conversation
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
f2ede66
to
e11d186
Compare
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
syslog
sourcesyslog
sink
warnings: [] | ||
type: bool: default: false | ||
} | ||
appname_key: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what others think about these options. There are a few concerns I have:
- We plan to support schemas in the future. If we know the schema, we can do this mapping automatically, and if we don't, we'll provide a manual fallback. In that case, these options would likely live under
encoding.schema.appname_key
, etc. I'm wondering if we want to set that precedence now or later? - There is an argument to be made that users should front this sink with a
remap
transform and map to static keys as defined by thesyslog
sink. In other words, dropping all of these options.
cc @jszwedko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we definitely want to improve this sort of "schema mapping" in Vector. Until we have defined a path forward there, I'm inclined to keep the status quo, which is foo_key
fields like we have here, rather than introduce a change here that might not reflect the actual desired end state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another pattern in sinks of using templatable fields for this too. This would look something like:
appname = "{{ some_field }}" # default to `{{ app_name }}`
@prognant any thoughts on supporting the structured data field? Happy to defer that as a follow-up improvement. |
@binarylogic I did a rought support for structured data (disabled by default), if enabled, as fields are removed from the event while they are being added to the message struct, it will walk through all remaining fields and add those as structured data. I did that to attempt to get a symmetric behaviour w.r.t the syslog source, I'm not sure that's the best option, we could probably expect that users wanting to use structured data will be eager to get those from a single root field with the slight side effect of increasing the overall depth of the event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This generally looks good to me. One suggestion might be to add a test or two that assert that the syslog
source and sink can interoperate. This might help suss out any inconsistencies as well as ensuring that syslog can actually be used as a "transport" between two vector instances.
"type": "type": string: enum: syslog: "The type of this component." | ||
rfc5424: { | ||
common: true | ||
description: "If this is set to `true` message will be formatted according to the RFC5424 else the RFC3164 will be used." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is better to default to RFC3164? I think RFC5424 would be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for rfc5424
is true
, but now I realize that having a rfc3164
setting with a default value set to false
is more explicit for people wanting to use the older formating. I updated the option accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of making this an actual enum in the config?
We could default to rfc5424
. I think this would leave the door open to supporting more encodings in the future rather than a boolean flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that makes sense, if the rfc5848 (same as rfc5424 + extra structured data to sign the message) has to be supported it will be a convenient way of keeping the same kind of config.
warnings: [] | ||
type: bool: default: false | ||
} | ||
appname_key: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another pattern in sinks of using templatable fields for this too. This would look something like:
appname = "{{ some_field }}" # default to `{{ app_name }}`
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Templatable fields seems convenient for plain text field as it allow an easy override and may provide at some point a way to set a default value #1692. However in this syslog case I think we need some default value (at least for severity/facility). This also raise the question of structured data, I don't think that templated fields can be used for that, I'm thinking we should keep a Also regarding the "closing the loop test" where a syslog sink would be sending to a syslog source, I was thinking to put those in |
cdeddce
to
5e107b4
Compare
Signed-off-by: prognant <[email protected]>
5e107b4
to
bff3e8b
Compare
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @prognant ! This is looking good to me.
Also regarding the "closing the loop test" where a syslog sink would be sending to a syslog source, I was thinking to put those in
./tests
and schedule that as a backlog item WDYT @jszwedko ?
This strikes me as the sort of backlog item that we would never get to 😅 If you aren't opposed to it, I think it'd be preferable to add those tests now while we are introducing the matching sink for the source. You can see an example of us doing that for the splunk_hec
source: https://github.com/timberio/vector/blob/1102e5e8432fb93ebb93a7eab840b19ab1ff46fd/src/sources/splunk_hec.rs#L739-L1200 I do think that putting them in tests/
would be a reasonable spot though.
Signed-off-by: prognant <[email protected]>
I totally agree that this kind of backlog item ends up forgotten or postponed forever, thanks for the link, I will try to come up with some test cases.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though I am a bit puzzled with what's going on with the timestamp conversion.
|
||
let ts = log.remove(log_schema().timestamp_key()).and_then(|v| { | ||
v.as_timestamp() | ||
.map(|v| FixedOffset::west(0).timestamp_nanos(v.timestamp_nanos())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like this could be handled easier. What is this accomplishing? Is FixedOffset::west(0)
different than Utc
?
.map(|procid| match procid { | ||
Value::Integer(pid) => ProcId::PID(pid as i32), | ||
Value::Bytes(_) => ProcId::Name(procid.to_string_lossy()), | ||
_ => ProcId::Name("vector".to_string()), | ||
}) | ||
.or_else(|| Some(ProcId::Name("vector".to_string()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about merging the common defaults here?
.map(|procid| match procid { | |
Value::Integer(pid) => ProcId::PID(pid as i32), | |
Value::Bytes(_) => ProcId::Name(procid.to_string_lossy()), | |
_ => ProcId::Name("vector".to_string()), | |
}) | |
.or_else(|| Some(ProcId::Name("vector".to_string()))); | |
.and_then(|procid| match procid { | |
Value::Integer(pid) => Some(ProcId::PID(pid as i32)), | |
Value::Bytes(_) => Some(ProcId::Name(procid.to_string_lossy())), | |
_ => None, | |
}) | |
.or_else(|| Some(ProcId::Name("vector".to_string()))); |
let mut d = vec![]; | ||
for (k, v) in log.as_map().iter() { | ||
let mut e = StructuredElement { | ||
id: k.clone(), | ||
params: vec![], | ||
}; | ||
if let Value::Map(m) = v { | ||
for (k, v) in m.iter() { | ||
e.params.push((k.clone(), v.to_string_lossy())); | ||
} | ||
} else { | ||
e.params.push(("value".to_string(), v.to_string_lossy())); | ||
} | ||
d.push(e); | ||
} | ||
d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it could be better structured using maps:
let mut d = vec![]; | |
for (k, v) in log.as_map().iter() { | |
let mut e = StructuredElement { | |
id: k.clone(), | |
params: vec![], | |
}; | |
if let Value::Map(m) = v { | |
for (k, v) in m.iter() { | |
e.params.push((k.clone(), v.to_string_lossy())); | |
} | |
} else { | |
e.params.push(("value".to_string(), v.to_string_lossy())); | |
} | |
d.push(e); | |
} | |
d | |
log.as_map().iter().map(|(k, v)| { | |
StructuredElement { | |
id: k.clone(), | |
params: match v { | |
Value::Map(m) => { | |
m.iter().map(|(k, v)| (k.clone(), v.to_string_lossy())).collect() | |
} | |
v => vec![("value".to_string(), v.to_string_lossy())], | |
}, | |
} | |
}) | |
.collect() |
@prognant , given the sink changes I think we should close this and revisit. |
closes #6863
Default field name are aligned to the ones populated by the
syslog
source.Basic support for structured data (attempt to f-1 the syslog source), there is room for improvement here for sure.
Subsequent tasks : end-to-end syslog sink to syslog source tests, better structured data support (likely to need slight adjustment in the syslog source for consistency), leverage schemas support when available.