Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new sink): Initial syslog sink #7106

Closed
wants to merge 15 commits into from
Closed

Conversation

prognant
Copy link
Contributor

@prognant prognant commented Apr 13, 2021

closes #6863

Default field name are aligned to the ones populated by the syslog source.

Basic support for structured data (attempt to f-1 the syslog source), there is room for improvement here for sure.

Subsequent tasks : end-to-end syslog sink to syslog source tests, better structured data support (likely to need slight adjustment in the syslog source for consistency), leverage schemas support when available.

Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
@binarylogic binarylogic changed the title feat(new sink): Initial syslog source feat(new sink): Initial syslog sink Apr 13, 2021
warnings: []
type: bool: default: false
}
appname_key: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what others think about these options. There are a few concerns I have:

  1. We plan to support schemas in the future. If we know the schema, we can do this mapping automatically, and if we don't, we'll provide a manual fallback. In that case, these options would likely live under encoding.schema.appname_key, etc. I'm wondering if we want to set that precedence now or later?
  2. There is an argument to be made that users should front this sink with a remap transform and map to static keys as defined by the syslog sink. In other words, dropping all of these options.

cc @jszwedko

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely want to improve this sort of "schema mapping" in Vector. Until we have defined a path forward there, I'm inclined to keep the status quo, which is foo_key fields like we have here, rather than introduce a change here that might not reflect the actual desired end state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have another pattern in sinks of using templatable fields for this too. This would look something like:

appname = "{{ some_field }}" # default to `{{ app_name }}`

@binarylogic
Copy link
Contributor

@prognant any thoughts on supporting the structured data field? Happy to defer that as a follow-up improvement.

@prognant
Copy link
Contributor Author

prognant commented Apr 13, 2021

@binarylogic I did a rought support for structured data (disabled by default), if enabled, as fields are removed from the event while they are being added to the message struct, it will walk through all remaining fields and add those as structured data. I did that to attempt to get a symmetric behaviour w.r.t the syslog source, I'm not sure that's the best option, we could probably expect that users wanting to use structured data will be eager to get those from a single root field with the slight side effect of increasing the overall depth of the event.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This generally looks good to me. One suggestion might be to add a test or two that assert that the syslog source and sink can interoperate. This might help suss out any inconsistencies as well as ensuring that syslog can actually be used as a "transport" between two vector instances.

"type": "type": string: enum: syslog: "The type of this component."
rfc5424: {
common: true
description: "If this is set to `true` message will be formatted according to the RFC5424 else the RFC3164 will be used."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is better to default to RFC3164? I think RFC5424 would be preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for rfc5424 is true, but now I realize that having a rfc3164 setting with a default value set to false is more explicit for people wanting to use the older formating. I updated the option accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of making this an actual enum in the config?

For example: https://github.com/timberio/vector/blob/a21687c82a619ecafb24a80fcb939ae9015e7952/docs/reference/components/sources/aws_s3.cue#L62-L76

We could default to rfc5424. I think this would leave the door open to supporting more encodings in the future rather than a boolean flag.

Copy link
Contributor Author

@prognant prognant Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that makes sense, if the rfc5848 (same as rfc5424 + extra structured data to sign the message) has to be supported it will be a convenient way of keeping the same kind of config.

warnings: []
type: bool: default: false
}
appname_key: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have another pattern in sinks of using templatable fields for this too. This would look something like:

appname = "{{ some_field }}" # default to `{{ app_name }}`

@prognant
Copy link
Contributor Author

prognant commented Apr 26, 2021

Templatable fields seems convenient for plain text field as it allow an easy override and may provide at some point a way to set a default value #1692. However in this syslog case I think we need some default value (at least for severity/facility).

This also raise the question of structured data, I don't think that templated fields can be used for that, I'm thinking we should keep a structured_data_key option that would be the event key pointed to a properly structured fields (an alternative would be to expose a list of fields to be included in the structured data), BTW to make it symmetrical W.R.T the syslog source this would mean adding an option like flatten_structured_data (that would default to true to keep the current behaviour).

Also regarding the "closing the loop test" where a syslog sink would be sending to a syslog source, I was thinking to put those in ./tests and schedule that as a backlog item WDYT @jszwedko ?

Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
Signed-off-by: prognant <[email protected]>
@prognant prognant marked this pull request as ready for review April 27, 2021 14:23
@prognant prognant requested review from a team, lukesteensen and bruceg and removed request for a team April 27, 2021 14:23
@prognant prognant requested review from vladimir-dd and removed request for a team April 27, 2021 14:23
@prognant prognant requested a review from jszwedko April 28, 2021 08:13
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @prognant ! This is looking good to me.

Also regarding the "closing the loop test" where a syslog sink would be sending to a syslog source, I was thinking to put those in ./tests and schedule that as a backlog item WDYT @jszwedko ?

This strikes me as the sort of backlog item that we would never get to 😅 If you aren't opposed to it, I think it'd be preferable to add those tests now while we are introducing the matching sink for the source. You can see an example of us doing that for the splunk_hec source: https://github.com/timberio/vector/blob/1102e5e8432fb93ebb93a7eab840b19ab1ff46fd/src/sources/splunk_hec.rs#L739-L1200 I do think that putting them in tests/ would be a reasonable spot though.

docs/reference/components/sinks/syslog.cue Outdated Show resolved Hide resolved
docs/reference/components/sinks/syslog.cue Outdated Show resolved Hide resolved
src/sinks/syslog.rs Outdated Show resolved Hide resolved
Signed-off-by: prognant <[email protected]>
@prognant
Copy link
Contributor Author

prognant commented Apr 28, 2021

I totally agree that this kind of backlog item ends up forgotten or postponed forever, thanks for the link, I will try to come up with some test cases.

This strikes me as the sort of backlog item that we would never get to 😅 If you aren't opposed to it, I think it'd be preferable to add those tests now while we are introducing the matching sink for the source. You can see an example of us doing that for the splunk_hec source:

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I am a bit puzzled with what's going on with the timestamp conversion.


let ts = log.remove(log_schema().timestamp_key()).and_then(|v| {
v.as_timestamp()
.map(|v| FixedOffset::west(0).timestamp_nanos(v.timestamp_nanos()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like this could be handled easier. What is this accomplishing? Is FixedOffset::west(0) different than Utc?

Comment on lines +190 to +195
.map(|procid| match procid {
Value::Integer(pid) => ProcId::PID(pid as i32),
Value::Bytes(_) => ProcId::Name(procid.to_string_lossy()),
_ => ProcId::Name("vector".to_string()),
})
.or_else(|| Some(ProcId::Name("vector".to_string())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about merging the common defaults here?

Suggested change
.map(|procid| match procid {
Value::Integer(pid) => ProcId::PID(pid as i32),
Value::Bytes(_) => ProcId::Name(procid.to_string_lossy()),
_ => ProcId::Name("vector".to_string()),
})
.or_else(|| Some(ProcId::Name("vector".to_string())));
.and_then(|procid| match procid {
Value::Integer(pid) => Some(ProcId::PID(pid as i32)),
Value::Bytes(_) => Some(ProcId::Name(procid.to_string_lossy())),
_ => None,
})
.or_else(|| Some(ProcId::Name("vector".to_string())));

Comment on lines +232 to +247
let mut d = vec![];
for (k, v) in log.as_map().iter() {
let mut e = StructuredElement {
id: k.clone(),
params: vec![],
};
if let Value::Map(m) = v {
for (k, v) in m.iter() {
e.params.push((k.clone(), v.to_string_lossy()));
}
} else {
e.params.push(("value".to_string(), v.to_string_lossy()));
}
d.push(e);
}
d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could be better structured using maps:

Suggested change
let mut d = vec![];
for (k, v) in log.as_map().iter() {
let mut e = StructuredElement {
id: k.clone(),
params: vec![],
};
if let Value::Map(m) = v {
for (k, v) in m.iter() {
e.params.push((k.clone(), v.to_string_lossy()));
}
} else {
e.params.push(("value".to_string(), v.to_string_lossy()));
}
d.push(e);
}
d
log.as_map().iter().map(|(k, v)| {
StructuredElement {
id: k.clone(),
params: match v {
Value::Map(m) => {
m.iter().map(|(k, v)| (k.clone(), v.to_string_lossy())).collect()
}
v => vec![("value".to_string(), v.to_string_lossy())],
},
}
})
.collect()

@bits-bot
Copy link

bits-bot commented Nov 3, 2021

CLA assistant check
All committers have signed the CLA.

@binarylogic
Copy link
Contributor

@prognant , given the sink changes I think we should close this and revisit.

@binarylogic binarylogic closed this Jan 5, 2022
@prognant prognant deleted the prognant/syslog-sink branch March 4, 2022 14:01
@prognant prognant restored the prognant/syslog-sink branch March 4, 2022 14:02
@prognant prognant deleted the prognant/syslog-sink branch July 20, 2022 13:51
@syedriko syedriko mentioned this pull request Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New syslog sink
6 participants