Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An RFC for structured logging #296

Merged
merged 21 commits into from
Mar 10, 2019
Merged

Conversation

KodrAus
Copy link
Contributor

@KodrAus KodrAus commented Oct 14, 2018

Rendered

Reference implementation

Sample API docs


Hi! So we've been working through some designs in #149 on how structured logging could be supported in log. I've since gone away and tried to capture the results of that discussion and my experiments over the year into a concrete proposal that we can refine further. It looks pretty long, but a fair chunk of that is blocks of source.

I don't necessarily expect us to merge this PR even once we're happy with it, it's really just an effort to get more directed feedback on a significant API using a process we're familiar with in the Rust community.

cc @sfackler @dpc (who I know has limited capacity) @Thomasdezeeuw


Open questions:

  • macro syntax

@diwic
Copy link

diwic commented Oct 14, 2018

As for serde, I want to raise an issue of serde not being capable of serializing empty arrays correctly, i e, the type information gets lost. As this issue was closed as "wontfix", this made serde impossible to use for some formats (such as mine).

Maybe that limitation is not severe enough for this case - that's up to the rest of you to decide - but I think it's something to be aware of when choosing a serialization crate.

@dpc
Copy link

dpc commented Oct 14, 2018

The only thing that worries me is the inability to express data that can be sent to another thread without serialization. One idea that I have is putting additional bound on every value: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html

This way values are always taken as references, but a async-logger implementation can just call to_owned() on them and send a copy (possibly Boxed, orArced) to another thread.

@dpc
Copy link

dpc commented Oct 14, 2018

@diwic while I typically love using serde, there were multiple times where I was not happy with it trying to implement the backend (some serialization logic etc). It was typically related to handling sequences. Each time I dug, there were good reasons for how things were the certain way, and there is no way to make a serialization library that is perfect in 100% of use cases. So I can relate, but it is a de facto standard, at least right now.

It seems to me, having serde dependency behind a feature flag, and thus the ability to support other solutions (or serde versions) is a good thing to have.

The pain point with libraries like logging is that they are interoperability point of the whole ecosystem, and having to introduce any breaking changes (like public dependency) can be a painful experience for everyone.

@sfackler
Copy link
Member

I'm not convinced that a serde dependency is the right choice for this use case. The log crate needs to focus on being lightweight in terms of code generation in particular. I think a virtual-dispatch based approach may work out better there.

@bitemyapp
Copy link

For some prior art, over in Haskell-land I've been using katip. Not perfect (perhaps particularly on the serialization side?), but it's worked better than previous libraries I'd used that attempted to provide single input / multi output structured logging w/ support for ordinary text logs.

A faculty for flipping between nginx-style / apache compatible / etc. text logging would be neat too.

@TimNN
Copy link

TimNN commented Oct 14, 2018

I haven't wrapped my head around all the details yet, but requiring serde as a depending feels kind of heavy handed to me (especially since, as far as I can tell, there is no way for the top-level crate to disable the serde integration).

I would personally feel better about this if there was another feature, only for use in the top-level crate, that would act as a "kill switch", removing the dependency on serde. (I'm not sure if that is feasible. For the log-recording part it should be, I think; for the sinks I'm not so sure).

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 14, 2018

It sounds like we're not sold on depending on serde here. There certainly are some big complications. The rationale was that if we don't use serde::Serialize and serde::Serializer then we'll need to define traits that are conceptually very similar to them. We'll either have to comprimise in what can be represented by punting on the advanced APIs that make Serializer complex, like sequences and maps, or we'll have to introduce another API with extra cognitive load for libraries like uuid to implement, and for formats like json to support.

So maybe we can find an API that doesn't bake in serde, and can be naturally object-safe, but also doesn't require libraries to opt-in to supporting structured logging as well as generic serialization. What I'd really like to avoid is accumulating a random bag of arbitrary traits in the ecosystem over time that need to be implemented before new libraries can be considered useful.

@gnzlbg
Copy link

gnzlbg commented Oct 14, 2018

I think that adding a dependency to include a trait is not a big issue. The problem is that in the case of serde, this dependency is huge, which adds a non-negligible cost to everybody using log even if they are not using structured logging.

If instead of requiring full serde, we would just require a thin serde-traits crate instead, then the cost of pulling the trait might be insignificant, and just adding it by default a non issue.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 14, 2018

The problem is that in the case of serde, this dependency is huge, which adds a non-negligible cost to everybody using log even if they are not using structured logging.

This might just be my opinion after being immersed in it for so long, but I can't think of any case where non-structured logging is preferable to structured logging. If it's available we should be using it.

serde effectively is already just a collection of traits and some default implementations, but is definitely a fairly heavy crate as far as compile times go. That cost is amortized if the library is already part of the dependency graph though.

@diwic
Copy link

diwic commented Oct 14, 2018

So, the easiest way is just to say that keys are &str and values are Display. (Maybe there could be something that outputs the Debug string rather than the Display string.)
Maybe we could just start there?

Because the moment we cross that line and say that we want something else than strings in the log backend, may it be integers, arrays, maps, booleans or something else, the question becomes much more complex because a) we then have to decide what we want to support and what we want to not support and b) all log backends need to implement support for this as well.
Surely serde can help with both, but then we're stuck with serde's design choices. Are we happy with that?

@christophebiocca
Copy link

One question that isn't clearly answered from reading the RFC:
Can metadata be interpolated into the log message itself? We used a home-built structured logging framework for python and have generally captured all arguments as metadata, even if they showed up in the message. That way we don't have to write arguments twice (once as metadata, once as logging arguments) Given that all metadata must support Display in some form this seems trivial to support?

@matthieu-m
Copy link

Is it possible/desirable to have a context argument?

The examples show key-value pairs being passed explicitly, however I could imagine wanting to prepare a "map" of key-value pairs ahead of time (a context) and passing that to multiple logs.

@dpc
Copy link

dpc commented Oct 14, 2018

@matthieu-m

Is it possible/desirable to have a context argument?

Your logging backend could implement it and be initialized with desired metadata, that would add to key-value pairs of each Record when serializing it.

If you want explicit context, that you could manipulate and pass at runtime, like in slog, then it seems it's not part of this RFC, and the general opinion is that "contextual logging" is a bit intrusive (because you have to keep passing the context, or use some form of scopes) and it shouldn't be part of log itself. It could however still be provided by libraries like slog, and by having structured logging in log the interoperability between them would be better.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 14, 2018

So, the easiest way is just to say that keys are &str and values are Display. (Maybe there could be something that outputs the Debug string rather than the Display string.)
Maybe we could just start there?

I like the sentiment here, but I don't think we can really start with Display and build up, because Display is simply not sufficient for structured logging. All you get are strings, which is really limiting for log servers that use that structure for querying later.

The examples show key-value pairs being passed explicitly, however I could imagine wanting to prepare a "map" of key-value pairs ahead of time (a context) and passing that to multiple logs.

Sure can! While preparing this RFC I put together an an experimental repo for this kind of contextual logging.

Can metadata be interpolated into the log message itself?

This is something we do in .NET using message templates. I think an alternative implementation of the log! macro, implemented using procedural macros, could do this. The syntax for the log! macro proposed here is pretty sparten to remain compatible and to encourage crates to experiment with their own log! macros. Interpolating from a single bag of data is one thing to try. Another is being able to log an impl Fail or impl Error as an argument directly in error!.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 14, 2018

My sticking points on the serialization question are:

  • Don't require libraries depend on log and implement a trait before their types can be logged
  • Don't limit the kinds of data structures that can be logged to anything less than serde supports

We possibly can find a way to achieve that without baking in serde directly, but creating a pervasive alternative serialization framework to support just for logging, or arbitrarily limiting what can be logged both seem like non-starters to me.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 14, 2018

The log crate needs to focus on being lightweight in terms of code generation in particular. I think a virtual-dispatch based approach may work out better there.

@sfackler this is using virtual dispatch by way of erased-serde, but that crate has to work fairly hard to make that possible. There aren't any generics involved outside of trait implementations, which would probably exist in any implementation.

The lightweight code generation requirement is an interesting one I haven't really thought about at all. Could you elaborate on it a bit more? Which usergroup is it important to?

@softprops
Copy link

I believe the author of serde may also have some ideas for a lighter weight serde alternative
https://github.com/dtolnay/miniserde/blob/master/README.md

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 15, 2018

Looking at the complexity concern with serde (just ignoring the compile-time / artifact-size concerns for now), we could make life easier for formats that aren't already in the context of serialization (like env_logger's human-centric format) by tweaking the required trait bounds for logging. Instead of picking either Display or Serialize for structured values and facading that decision behind another implementation of Serialize, we could always require both Debug and Serialize, letting implementations of Log pick the one they want through their Visitor. So if you're already writing logs using write! where structure is lost already you can keep doing that, and if you want to serialize the whole lot as a structure-retaining format like json you can do that too.

@diwic
Copy link

diwic commented Oct 15, 2018

I like the sentiment here, but I don't think we can really start with Display and build up, because Display is simply not sufficient for structured logging. All you get are strings, which is really limiting for log servers that use that structure for querying later.

Strings are indeed all we get with such a proposal, which makes writing a backend to "log" a lot easier, than if it has to support all data types that someone somewhere has decided.

If there are technical difficulties with starting with "anything Display" as values, how about starting with only &str? That shouldn't cause any concerns with backwards compatibility if we add more types later.

@Dowwie
Copy link

Dowwie commented Oct 15, 2018

This seems like a proposal for a white horse with black stripes that we are told ought not be considered a zebra. I hope that I'm wrong but signs are showing that the stdlibbers are hungry to grow stdlib beyond its original mandate.

I'm not an advocate of stdlib crates competing with those of the rest of the ecoystem. stdlib Rust should foster cooperation, not competition. So, what would I need slog for if its main features were made available in a stdlib log crate? This is a question I and others will have if core functional overlaps exist.

I'm very interested in seeing a proposal for structured logging that is novel and does not overlap. Why would the authors of slog want to refactor working code just to make it line with stdlib team designs? Why should they?

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 15, 2018

Strings are indeed all we get with such a proposal, which makes writing a backend to "log" a lot easier, than if it has to support all data types that someone somewhere has decided.

I totally agree with you there. Having to write a Serializer just to write values for a human-centric format is a formidable barrier-to-entry. The Using Visitor to print or serialize key value pairs section has some examples of how human-centric and machine-centric formats could support structured logging. The human-centric one is actually a bit intellectually dishonest though because it omits the few hundred lines of Serializer code for brevity. However, if we make all values Debug + Serialize instead of just Serialize, then a human-centric log format using std::fmt could be as simple as:

fn write_pretty(w: impl Write, r: &Record) -> io::Result<()> {
    // Write the first line of the log record
    ...

    // Write each key-value pair using the `WriteKeyValues` visitor
    record
        .key_values()
        .try_for_each(|k, v| writeln!(&mut w, "{}: {:?}", k, v))
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e.into_error()))
}

For properly structured formats, where a Serializer implementation already exists, like serde_json, the structured logging support can still be standard serde code:

fn write_json(w: impl Write, r: &Record) -> io::Result<()> {
    let r = SerializeRecord {
        lvl: r.level(),
        ts: epoch_millis(),
        msg: r.args().to_string(),
        props: r.key_values().serialize_as_map(),
    };

    serde_json::to_writer(w, &r)
}

#[derive(Serialize)]
struct SerializeRecord<KVS> {
    lvl: Level,
    ts: u64,
    msg: String,
    #[serde(flatten)]
    props: KVS,
}

So I think the Debug + Serialize approach could be a way to let implementors of Log use either the std::fmt machinery, or the serde::Serializer machinery as appropriate, without sacrificing structure altogether like we would with &str.

I have also been hacking around with serde itself following @gnzlbg's suggestion of a thin serde-traits library, and without all the impl blocks and other machinery on my box I can get a fresh serde to compile in ~2 seconds instead of ~9 (log currently takes ~1 second), so maybe there's an avenue to explore there too.

@dpc
Copy link

dpc commented Oct 15, 2018

@Dowwie

Logging is an interoperability problem. Almost every crate does some logging, and if parts of ecosystem use different logging, then in practice it will degrade to lowest common denominator.

Also, I don't know about others but I don't want to be maintaining libraries if I could get the same functionality from (semi-)stdlib.

@KodrAus

I totally agree with you there. Having to write a Serializer just to write values for a human-centric format is a formidable barrier-to-entry.

Most people just take a one that was written and doesn't care. I have used json, yaml, cbor and probably many other serialization formats in Rust, and I never had to implement own serde serializer.

Even less so with logging, where people typically just want to dump a bunch of json logs to ElasticSearch. For slog rarely (never?) do I hear about people wanting human-centric input other than what slog-term implements. I mean ... "human-centric output" is just a bunch of lines. Anyone can have preferences, but it's just not worth implementing again for every project.

@Ralith
Copy link

Ralith commented Oct 15, 2018

@KodrAus

Can metadata be interpolated into the log message itself?

This is something we do in .NET using message templates. I think an alternative implementation of the log! macro, implemented using procedural macros, could do this.

slog already does something like this, I think without proc macros; named format arguments are interpreted as structured metadata in addition to being formatted, while anonymous ones aren't. It's nice to use.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 15, 2018

Most people just take a one that was written and doesn't care. I have used json, yaml, cbor and probably many other serialization formats in Rust, and I never had to implement own serde serializer.

Well, slog-term does implement its own Serializer for its format, it's just not a serde one. I ended up going down a similar path in env_logger too while playing with the API this RFC proposes and writing a Serializer. Since there are a number of terminal loggers based on log already it does seem worth exploring how they could be better supported. I don't think the required Serialize + Debug bounds rather than Serialize would be a problem when converting from log into slog, but it might be a problem when converting from slog into log. If that's the case then it's not really suitable.

Even less so with logging, where people typically just want to dump a bunch of json logs to ElasticSearch. For slog rarely (never?) do I hear about people wanting human-centric input other than what slog-term implements.

This is true, there's a lot more value to get out of structured logging when you use a log store that supports working with structured data, like Elasticsearch. That's why I don't think we should punt on the actual structured serialization support for this RFC.

slog already does something like this, I think without proc macros; named format arguments are interpreted as structured metadata in addition to being formatted, while anonymous ones aren't. It's nice to use.

Ah gotcha. We could do the same thing in log!, but couldn't retain the structure of those format arguments without breaking existing users because the trait bounds would be different. It could be done without the same breakage using a proc macro though that's able to interpret the format string and know which values need to be interpolated from the one set of structured key-value pairs:

info!("the format string with a {number}"; number = 2);

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 15, 2018

I've spent some time going around in circles on serde, so maybe we could look at some alternatives and their respective trade-offs properly? Like turning the Visitor into something like Serializer, like slog does.

The trade-off there that I see is that we'd then need another trait like Serialize that's implemented throughout the ecosystem for loggable types, probably also alongside serde anyways.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 16, 2018

Ok, I spent some time exploring the serialization API more to remove the required serde dependency from this RFC and came up with a proof-of-concept.

This API doesn't require serde by default, but can support it using a Cargo feature. Without the serde feature enabled it's super lightweight, the idea being that out-of-the-box we just support a limited set of primitive types as structured key-value pairs, but that set can be expanded to any type that implements serde::Serialize when the feature is opted into. The Serialize type from this crate doesn't become a public dependency because you can't implement it manually (that also lets us change its default bounds using Cargo features without fear of breakage), and a reasonable set of types is still supported even without the serde feature.

I think this approach can mitigate some of the concerns about serde being too heavyweight, but not miss the opportunity to take advantage of its already extensive ecosystem. Any thoughts?

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 18, 2018

Ok, what do we think of this API, based on the serialization proof-of-concept I posted before? I've stuck with the visit terminology over the serialize terminology because actually serializing a key-value pair is just one thing you might want to do with them. Other things include checking whether a specific pair is present and doing something with its primitive type, like comparing the timestamp on a record to one that was previously set in some ambient context to calculate the elapsed time.

I changed KeyValueSource to Source, Visitor to SourceVisitor, and Serialize to Visit. The Error type is basically the same as what's currently in the RFC.

// --- Set of key-value pairs --- //

pub trait Source {
    fn visit<'kvs>(&'kvs self, visitor: &mut dyn SourceVisitor<'kvs>) -> Result<(), Error>;


    // --- Provided adapter methods --- //
    // Using these you probably don't to build a `SourceVisitor` yourself

    fn erase(&self) -> ErasedSource
    where
        Self: Sized,
    {}

    fn try_for_each<F, E>(self, f: F) -> Result<(), Error>
    where
        Self: Sized,
        F: FnMut(Key, Value) -> Result<(), E>,
        E: Into<Error>,
    {}

    fn get<'kvs, Q>(&'kvs self, key: Q) -> Option<Value<'kvs>>
    where
        Q: Borrow<str>,
    {}

    #[cfg(feature = "serde")]
    fn serialize_as_map(self) -> SerializeAsMap<Self>
    where
        Self: Sized,
    {}

    ...
}

pub trait SourceVisitor<'kvs> {
    fn visit_pair(&self, key: Key<'kvs>, value: Value<'kvs>) -> Result<(), Error>;
}


// --- Individual key or value --- //

// Can't be implemented manually
// It's either a fixed set of `std` types or `T: Serialize`
pub trait Visit: private::Sealed {
    fn visit(&self, visitor: &mut dyn Visitor) -> Result<(), Error>;
}

#[cfg(not(feature = "serde"))]
impl Visit for $std_ty {} // where $std_ty is `u8`, `bool`, `&str`, `&Path` etc

#[cfg(feature = "serde")]
impl<T: Serialize + ?Sized> Visit for T {} // covers all $std_ty impls + a bunch more

pub trait Visitor {
    fn visit(&mut self, v: &dyn Visit) -> Result<(), Error> {
        v.visit(self)
    }

    fn visit_i64(&mut self, v: i64) -> Result<(), Error>;
    fn visit_u64(&mut self, v: u64) -> Result<(), Error>;
    fn visit_f64(&mut self, v: f64) -> Result<(), Error>;
    fn visit_bool(&mut self, v: bool) -> Result<(), Error>;
    fn visit_char(&mut self, v: char) -> Result<(), Error>;
    fn visit_str(&mut self, v: &str) -> Result<(), Error>;
    fn visit_bytes(&mut self, v: &[u8]) -> Result<(), Error>;

    // If a format is given a complex `serde` datastructure, like a map or sequence
    // A format might want to write a placeholder value like `null` instead
    fn unsupported(&mut self) -> Result<(), Error> {
        Err(Error::msg("unsupported value"))
    }
}


// --- Concrete types for keys and values --- //

// `ToOwned::Owned: Send + Sync + 'static`?
// use `Borrow<str>` as the generic trait bound for keys
pub struct Key<'k> {}

impl<'k> Key<'k> {
    fn as_str(&self) -> &str {}
}

// `Key` is always `Visit`
// If `serde` is available, it's also `Serialize`
#[cfg(not(feature = "serde"))]
impl<'k> Visit for Key<'k> {}
#[cfg(feature = "serde")]
impl<'k> Serialize for Key<'k> {}

// `ToOwned::Owned: Send + Sync + 'static`?
// use `Visit` as the generic trait bound for values
pub struct Value<'v> {}

// `Value` is always `Visit`
// If `serde` is available, it's also `Serialize`
#[cfg(not(feature = "serde"))]
impl<'v> Visit for Value<'v> {}
#[cfg(feature = "serde")]
impl<'v> Serialize for Value<'v> {}

the log! macros would be able to accept idents as keys, and any expression that implements Visit as values.

Why Visitor and SourceVisitor?

The API could be smaller if there was just one Visitor type:

trait Visitor {
    fn visit_key(&mut self, v: Key);
    fn visit_value(&mut self, v: Value);
    fn visit_pair(&mut self, k: Key, v: Value) {}

    fn visit_i64(&mut self, v: i64);
}

This might seem like a simplification, but what it's really doing is just making the sequence of operations in serializing a key-value pair implicit. In our format that implements Visitor we'd need to think about values being serialized before keys, keys or values being serialized multiple times, or being given a primitive like visit_str instead of a key or value and having to guess which one it is.

So having one trait, SourceVisitor, that surfaces key-value pairs, and another trait, Visitor, that looks at individual values lets the format decide exactly how it's going to work with its pairs. It also means we can lean on SourceVisitor to work with pairs consistently so that ergonomic methods like try_for_each and serialize_as_map can be provided on Source. With the combination of these two methods I actually haven't found myself even needing to write a SourceVisitor manually in my experiments with this API.

Why structs for Key and Value instead of trait objects?

Using structures for Key and Value is for backwards compatibility. With a dyn Trait or impl Trait we've got all our cards on the table, and it's much harder to introduce changes without breakage. Using structures we don't have that problem. We could also abstract over owned and borrowed keys and values, and provide some useful methods using static rather than dynamic dispatch.

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 4, 2019

A possible future extension we could make to Value would be letting you treat an error type explicitly:

impl<'v> Value<'v> {
    fn from_error(err: &'v impl std::error::Error) -> Self { .. }

    fn as_error(&self) -> Option<&dyn std::error::Error> { .. }
}

with whatever bounds it is we have to satisfy to make the returned error from as_error downcastable.

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 7, 2019

So we touched on this a bit at the all hands, and while I don't think any major red flags were raised, it is a vast piece of API to consider. At least we can land it incrementally.

One thing we didn't get a chance to look at is the macro syntax. Unlike the traits and structures we probably can't land a subset of the macro. Right now, the RFC macro syntax is:

log!(format, format_arg1, format_arg2, .. ; structured_arg1, structured_arg2);

It's a shame to have to repeat format args in the structured list if you don't just want them interpolated into the message, but I couldn't think of a backwards-compatible way to do it. But, maybe we could use a token in the macro to opt-in to treating format args as also structured:

log!(s; format, structured_arg1, structured_arg2, .. ; additional_arg1, additional_arg2);

So the leading s tells us format arguments should implement ToValue and be in the set of key-value pairs in addition to being interpolated into the format string, avoiding having to use two sets (which has been raised here a few times as not ideal), or having to break existing log! callers.

Does anybody have any other thoughts on their preferred macro syntax?

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 7, 2019

cc @sfackler

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 7, 2019

Using some kind of switch within the macro is a strategy that would also work for treating the output of #317 as an optionally structured value too.

@dekellum
Copy link
Contributor

Macro syntax: See alternative considered for #317, using '=' as marker token. At least in that case it seemed to me, like more to teach with DSL, than is the case with dedicated function-like macros. That said, the -v macros are more unique in that they are restricted to a single value.

Have or might you consider that as an alternative? Like a parallel set of -s or other suffix macros where all named parameters are consider structured and available for the message format as well?

Also, for many cases where this would actually be used, would it be adequate and/or potentially more convenient to just configure something with the logging implementation to format specific structured values in the output, either before or after the passed message?

(I've held off on commenting here, trying to catch up on its evolution and also understand the tokio stuff, but...)

Have or might you also consider adding the rust-log-mdc crate (and what I presume is its java log4j/slf4j inspiration) as prior art? I think that is relevant because for use cases like including a user ID as structured attribute, it may be nice to "register" that value as a thread local or separately passed around "context" associate-array-like type. An exclusively thread-local approach doesn't work cleanly with async (which wasn't a Java thing, back in the day), at least without library support, but isn't tokio-trace's Span concept similar?

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 17, 2019

See alternative considered for #317, using '=' as marker token

The marker I was thinking of was something like kv, but = is a reasonable alternative too!

a parallel set of -s or other suffix macros

I think we could end up with a macro-explosion going the separate suite of macros route here, so something that doesn't require new macros for interpreting structured data differently would be better I think.

potentially more convenient to just configure something with the logging implementation to format specific structured values in the output

That's an interesting idea, and you could support it in a logging framework built on top of log's structured logging, but it would require the data we're interpolating directly into the message is also available as key-value pairs to process independently, which we need a marker token for in the existing macros.

Have or might you also consider adding the rust-log-mdc crate (and what I presume is its java log4j/slf4j inspiration) as prior art?

Thanks! I haven't looked at this crate before so will check it out.

@Ralith
Copy link

Ralith commented Feb 17, 2019

I for one would much rather have separate macros that have good syntax than a single unified macro with awkward syntax. Couldn't they even have the same names, just residing in a different module and requiring manual import?

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 17, 2019

I for one would much rather have separate macros that have good syntax than a single unified macro with awkward syntax.

@Ralith do you find the info!(kv; "format", arg; arg) style awkward? If so, what is it you don't like?

There's a whole new world of improvements we could make to the log! macros if we built them as proc macros instead of using macro_rules v1 like they are today. That's something I want to explore in a separate crate before trying to circle back into log. Without proc macros I think the structured logging support we can ship is bound to feel grafted on in some respects.

I think the problem with shipping separate macros that require manual imports is a lack of discoverability, and still working with consumers that want to #[macro_use] extern crate log; to bring everything into scope. On the other hand, we could punt entirely on macro syntax for this RFC, and just not surface structured logging in the current macros for now. Then we could experiment with new macros externally.

@bitemyapp
Copy link

bitemyapp commented Feb 17, 2019

@KodrAus I tend to agree with @Ralith and I do think info!(kv; "format", arg; arg) is awkward.

If that is what results in the final design there are decent odds I'd end up making a wrapper crate with different syntax and I'd rather not proliferate a logging API idiolect if I can help it.

Thank you very much for your efforts towards structured logging! It's a faculty near and dear to me and you've made tremendous progress in a short period of time.

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 18, 2019

Ok, I think it's probably better not to ship macro support that users don't like, since we can work on it externally, and frameworks like tokio-trace and slog that are already structured can integrate with log without log's macros surfacing its structured logging support.

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 18, 2019

Alrighty, I've updated the RFC text to punt on the macro syntax.

I'd really like to be able to spend some time exploring different macro designs that are structured off-the-bat. There's still tangible benefits to shipping a structured logging API in log even without macro support out-of-the-box by enabling better integration with existing structured frameworks, and opening up the design space for the community to bootstrap new log! macros that suit their needs. So I'm not suggesting we never add structured support to the log! macros in log, I'm suggesting we just ship what we need now to unblock experimenting with new macros externally.

What do you all think?

@Ralith
Copy link

Ralith commented Feb 18, 2019

@Ralith do you find the info!(kv; "format", arg; arg) style awkward? If so, what is it you don't like?

I meant to contrast it to taking structure from named format arguments directly, as in the info!("the format string with a {number}"; number = 2); example. Being forced to duplicate arguments for them to appear in both the message and the structured data would be unfortunate.

I'm suggesting we just ship what we need now to unblock experimenting with new macros externally.

That seems like a reasonable idea.

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 24, 2019

Alrighty, this has been running for a quite a while now, and thanks to everyone that's been involved so far it seems like we've worked through the major parts of the design space.

cc @rust-lang-nursery/libs and the community: shall we kick off a final comment period for this one? If you'd like to register a concern with the RFC please leave a comment here! If we haven't received any by Friday, 8th of March then I'll merge this one and work towards PRing the initial minimal API proposed in the text.

I appreciate that this is an enormous document, and the implications of various design decisions are pretty well buried at this point. So any PRs that introduce structured logging support will be small and incremental, so we have an opportunity to discuss them just-in-time too. I personally consider this RFC more of a guide than a commitment.

@neoeinstein
Copy link

I wanted to add in a little plug for something that the serilog package in .NET does that has been really useful when sifting through log messages, and that is to allow logging of the message template without substitution. As a structured log, it becomes immensely easy to do a search for the template, particularly if values haven't already been substituted in. Is there a way that we could do this and also give a logging backend access to the original message template? (Not sure if this would be something in this RFC, or maybe a different one, but structured logs and templates work really well together.)

@KodrAus
Copy link
Contributor Author

KodrAus commented Feb 25, 2019

@neoeinstein Sure! That would be a feature of the log! macro to capture the format as a template and/or as a fully interpolated message. Serilog's templates conform to a standard that an alternative log macro implementation built on top of log could also use instead of Rust's standard formatting syntax if it wanted to. That template could then be added as an ordinary key-value pair with a special name (like @mt).

EDIT: Just to add, one of the motivations for the Source::get method is for poking at well-defined key-value pairs like templates and timestamps.

@KodrAus KodrAus merged commit 74584a5 into rust-lang:master Mar 10, 2019
@KodrAus
Copy link
Contributor Author

KodrAus commented Mar 10, 2019

Merged!

Thanks everybody that's given their time and input so far ❤️ Now we'll start building up the implementation incrementally in log itself.

We'll tag these PRs with a GitHub label so you can find them, but log is fairly low traffic so watching this repository shouldn't generate too much noise.

@KodrAus KodrAus deleted the rfc/structured-logging branch March 10, 2019 23:29
@atroche
Copy link

atroche commented Sep 18, 2019

@KodrAus I realise I'm very late to the party, but would the API proposed by this RFC supported logging nested values? The use case I'm thinking of is when I have a HashMap representing an HTTP request's headers, and want to log those under a "headers" key. I couldn't see any mention of nested values in the RFC, and also couldn't infer whether or not they'd be possible to support.

@KodrAus
Copy link
Contributor Author

KodrAus commented Sep 23, 2019

Hi @atroche 👋

Ah yes that scenario is actually supported through external serialization APIs. So, for example, the current unstable implementation supports a framework called sval. So you can capture a HashMap like:

use log::kv::Value;

// Assuming `Header` implements `sval::Value`
let map: HashMap<String, Header> = request.headers();

// `Value` implements `sval::Value`, so we can visit complex structure
let value = Value::from_sval(&map);
// If `Header` implements `serde::Serialize` instead then we can still use it
let map = sval::serde::to_serialize(map);
let value = Value::from_sval(&map);

We haven't spent much time on how we might surface that through the macros just yet though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.