Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped context #2419

Closed
wants to merge 4 commits into from
Closed

Scoped context #2419

wants to merge 4 commits into from

Conversation

rgoers
Copy link
Member

@rgoers rgoers commented Mar 27, 2024

This adds support for a ScopedContext and a ResourceLogger as requested in apache/logging-log4j-kotlin#71 and similar to #2214

Git has a funny way of mangling old PR branches so I deleted the old one and recreated it. Unfortunately, that causes comments on the previous PR to be lost here. This has several changes to ScopedContext from the previous PR.

  1. ScopedContext.INITIAL_CONTEXT is no longer present.
  2. ScopedContext.current() is no longer necessary.
  3. Support for running child threads and propagating the ThreadContext and ScopedContext was added.

Example usage for wrapping a simple code block.

ScopedContext.where("id", UUID.randomUUID())
    .where("ipAddress", request.getRemoteAddr())
    .where("loginId", session.getAttribute("loginId"))
    .where("hostName", request.getServerName())
    .run(new Worker());

private class Worker implements Runnable {
    private static final Logger LOGGER = LogManager.getLogger(Worker.class);

    public void run() {
        LOGGER.debug("Performing work");
        String loginId = ScopedContext.get("loginId");
    }
}

Creating Threads:

BlockingQueue<Runnable> workQueue = new ArrayBlockingQueue<>(5);
ExecutorService executorService = new ThreadPoolExecutor(1, 2, 30, TimeUnit.SECONDS, workQueue);
Future<?> future = ScopedContext.where("id", UUID.randomUUID())
    .where("ipAddress", request.getRemoteAddr())
    .where("loginId", session.getAttribute("loginId"))
    .where("hostName", request.getServerName())
    .run(executorService, new Worker());
try {
    future.get();
} catch (ExecutionException ex) {
    logger.warn("Exception in worker thread: {}", ex.getMessage());
}

Note that this syntax supports virtual threads.

ResourceLogger can be used to include resource data in all log events coming from a specific Logger.

     private class User {

        private final String loginId;
        private final String role;
        private int loginAttempts;
        private final ResourceLogger logger;

        public User(final String loginId, final String role) {
            this.loginId = loginId;
            this.role = role;
            logger = ResourceLogger.newBuilder()
                .withClass(this.getClass())
                .withSupplier(new UserSupplier())
                .build();
        }

        public void login() throws Exception {
            ++loginAttempts;
            try {
                authenticator.authenticate(loginId);
                logger.info("Login succeeded");
            } catch (Exception ex) {
                logger.warn("Failed login");
                throw ex;
            }
        }


        private class UserSupplier implements Supplier<Map<String, String>> {

            public Map<String, String> get() {
                Map<String, String> map = new HashMap<>();
                map.put("LoginId", loginId);
                map.put("Role", role);
                map.put("Count", Integer.toString(loginAttempts));
                return map;
            }
        }
    }

All events in a ResourceLogger use ParameterizedMapMessage. When logging it behaves like a ParameterizedMessage in that formatMessage only returns the message directly being logged. However, it is a MapMessage so all the attributes can be accessed via the normal Layout, Lookup, and Filter capabilities we have added.

@rgoers rgoers requested review from ppkarwasz, vy and jvz March 27, 2024 21:50
@rocketraman
Copy link
Member

rocketraman commented Mar 28, 2024

All events in a ResourceLogger use ParameterizedMapMessage. When logging it behaves like a ParameterizedMessage in that formatMessage only returns the message directly being logged. However, it is a MapMessage so all the attributes can be accessed via the normal Layout, Lookup, and Filter capabilities we have added.

In structured logging use cases (e.g. to Google Cloud Logging), the resource attributes should go to the "context" field. With this approach, the attributes will go to the "message" field. Therefore, it won't be possible to easily find all the log messages when the same attributes are used with ScopedContext in some places (because the resource is relevant to a Thread) and with ResourceLogger in others (because the resource logs internally).

I think we need an approach that will allow sending resource attributes to "context" so the two approaches are consistent.

@rgoers
Copy link
Member Author

rgoers commented Mar 28, 2024

@rocketraman I did look into that and what you are asking for just can't be implemented in a reasonable way without significant mods to the API and will likely impact performance. The problem is that Loggers are singletons and the code where the LogEvent is constructed and the ContextMap is populated is pretty far down the stack. We defer creating LogEvents to avoid the performance overhead when they aren't going to be used. Messages, on the other hand, are constructed very early as they are fairly light-weight. So doing that in a per-instance Logger that wraps the singleton was fairly easy to do.

You will need to point me to some docs on Google Cloud Logging, however at my employer we are running a cloud infrastructure and log to an ELK stack and making this work the way you would like would be fairly easy, although it might be nice to enhance JsonTemplateLayout if it makes sense. To give you an idea, in my infrastructure we have JsonTemplateLayout configured iwth

{
  "@timestamp": {
    "$resolver": "timestamp",
    "pattern": {
      "format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
      "timeZone": "UTC"
    }
  },
  "ecs.version": "1.11.0",
  "log.level": {
    "$resolver": "level",
    "field": "name"
  },
  "application": "${lower:${spring:spring.application.name}}",
  "short_message": {
    "$resolver": "message",
    "stringified": true
  },
  "message": {
    "$resolver": "pattern",
    "pattern": "[%t] %X{requestId, sessionId, loginId, ipAddress} %C{1.}.%M:%L - %m%n"
  },
  "process.thread.name": {
    "$resolver": "thread",
    "field": "name"
  },
  "log.logger": {
    "$resolver": "logger",
    "field": "name"
  },
  "event.action": {
    "$resolver": "marker",
    "field": "name"
  },
  "event.data": {
    "$resolver": "map",
    "stringified": true
  },
  "labels": {
    "$resolver": "mdc",
    "flatten": true,
    "stringified": true,
    "pattern": "(?!(?i:token)).+"
  },
  "tags": {
    "$resolver": "ndc"
  },
  "error.type": {
    "$resolver": "exception",
    "field": "className"
  },
  "error.message": {
    "$resolver": "exception",
    "field": "message"
  },
  "error.stack_trace": {
    "$resolver": "exception",
    "field": "stackTrace",
    "stackTrace": {
      "stringified": true
    }
  }
}

Note that with this configuration the "message" key will look exactly like it would in a log file. However all the context fields, except for the user's token, are included as distinct fields while all the message map items are under the event.data tag. However, these could have also been flattened into distinct fields as the context data is and you wouldn't know the difference when searching. If you have other requirements for how the data should be formatted I am sure JsonTemplateLayout could be enhanced to support it, if it can't do it already.

One other thing. I explicitly implemented ParameterizedMapMessage because I have known for a long time that MapMessage is insufficient as it doesn't treat the message being logged separately from the Map. ParameterizedMapMessage does. When getFormattedMessage is called ONLY the data the user logged is included as it it was a ParameterizedMessage. However, the map is still available to all the Lookups, Filters, and Layouts that support MapMessages, so that works correctly as well.

@ppkarwasz
Copy link
Contributor

@rgoers,

@rocketraman I did look into that and what you are asking for just can't be implemented in a reasonable way without significant mods to the API and will likely impact performance. The problem is that Loggers are singletons …

The fact that there is only one logger per logger name is not set in stone (unlike in Logback). LoggerRegistry is a multi-key map with a (String, MessageFactory) key.

Due to the architectural choices you did at the beginning, Loggers don't even need to form a hierarchy, only LoggerConfigs are hierarchical. Let's profit from that and remove the "singleton" restriction from LogManager#getLogger. It is not a breaking change for the users to allow them to use different MessageFactory instances for the same logger name.

… and the code where the LogEvent is constructed and the ContextMap is populated is pretty far down the stack.

IMHO, the ParameterizedMapMessage class that you introduce is already half way to meet @rocketraman's expectations:

  • by using a special LogEventFactory the map data can be added to the context map and the baseMessage can be used as message. Something similar already happens with TimestampMessages that add their own timestamp to the LogEvent,
  • certainly the specification of ContextDataInjector#injectContextData must change: currently it already receives the current context data of the event as reusable parameter, but most implementation ignore it.

Another thing that I like about your approach is that is solves #1813: in SLF4J 2.x there is a list of key/value pairs that is distinct from the context map. Currently I am mixing the two sets in log4j-slf4j2-impl, which is not what users expect.

We defer creating LogEvents to avoid the performance overhead when they aren't going to be used. Messages, on the other hand, are constructed very early as they are fairly light-weight. So doing that in a per-instance Logger that wraps the singleton was fairly easy to do.

Is there really such a difference between Message and LogEvent? I am very happy that there is a net separation (unlike in Logback) between:

  • the data that the user explicitly specified in his code,
  • the data that the logging backend added,

but an implementation could very well choose to use something like MutableLogEvent all the way from the Logger to the Appender.

@rocketraman
Copy link
Member

rocketraman commented Mar 28, 2024

@rgoers

If you have other requirements for how the data should be formatted I am sure JsonTemplateLayout could be enhanced to support it, if it can't do it already.

I understand that if one has end-to-end control over layout, we can use the approach you've created here. However, we have a meaningful and clear separation between logging and layout. This means it is absolutely normal for developers to configure application logging, while some completely different of people (enterprise dev tooling group or perhaps upstream library such as Spring Boot) determines layout, and in many cases developers have no ability to affect layout at all — all applications across the enterprise use the same layout in order to enforce logging consistency. My references to Google Cloud Logging are for just such a case at an enterprise client of mine in which the logs in their Google Cloud have a context key for all contextual data, and I have absolutely no control over that. With your solution, my resource context will sometimes be in context and sometimes be in message. Furthermore, my message field won't even be consistent -- sometimes it will be a String, and sometimes it will be a Map of fields (unless I control the layout). This is just completely surprising behavior for a user.

If we want to blur the separation between context and message, then lets go all the way. Put everything into message, and then come up with an appropriate solution in order to distinguish different types of fields for layout so we can maintain a good separation between logging and layout. Make it a breaking change for Log4j3.

The problem is that Loggers are singletons and the code where the LogEvent is constructed and the ContextMap is populated is pretty far down the stack.

I'm not as familiar with Log4j internals as you and @ppkarwasz . However, from the perspective of a naiive user, I'm not sure what is so difficult? As a user I could wrap my logger in my own class that, for every method, did (in pseudo-code):

class ContextLogger {
  private final Map<String, String> context;
  private final Logger delegate;

  void log(...) {
    setContext(context)
    try {
      delegate.log(...)
    } finally {
      resetContext()
    }
  }
}

and (I believe) problem solved. However, this should be unnecessary for a user to do.

@rgoers
Copy link
Member Author

rgoers commented Mar 28, 2024

@ppkarwasz

The fact that there is only one logger per logger name is not set in stone (unlike in Logback). LoggerRegistry is a multi-key map with a (String, MessageFactory) key.

Due to the architectural choices you did at the beginning, Loggers don't even need to form a hierarchy, only LoggerConfigs are hierarchical. Let's profit from that and remove the "singleton" restriction from LogManager#getLogger. It is not a breaking change for the users to allow them to use different MessageFactory instances for the same logger name.

While it is a multi-key map you will get warnings if you try to create the same logger with multiple message factories as usually that is going to cause problems. However, I don't see how that helps with what rocketraman is requesting.

It is worth repeating that resource Loggers should NEVER be added to the LoggerRegistry. Resource Loggers need to have the same lifetime as the resource. Loggers in the registry never go away and so would result in a memory leak. Note that them not being in the registry isn't really a problem as the underlying Logger they use is. That Logger will be reconfigured when needed.

IMHO, the ParameterizedMapMessage class that you introduce is already half way to meet @rocketraman's expectations:

by using a special LogEventFactory the map data can be added to the context map and the baseMessage can be used as message. Something similar already happens with TimestampMessages that add their own timestamp to the LogEvent,
certainly the specification of ContextDataInjector#injectContextData must change: currently it already receives the current context data of the event as reusable parameter, but most implementation ignore it.

Sure, you could do this, but I am not sure it is a good idea. There are other uses of MapMessage where you do not want the data merged into the context data. I would be hesitant to always merge it without some indication from the user since there are a lot of cases where that is not what you want.

Is there really such a difference between Message and LogEvent?

Did you just answer your own question? Of course there is a difference. A Message is really just a container for the format String and its parameters. The fact that you can make more complex messages like a MapMessage is a perk. But the main difference is that the user really controls the contents of the Message while Log4j controls the LogEvent. Now, we could enhance the transformer to replace logging calls that take parameters with Logging calls that accept LogEvents but a) I believe it would be tough to make that garbage free and b) I am not sure we could always get it right since a lot is determined at run time.

The bottom line here is Piotr, I am not sure what to do with your comment. You haven't actually asked for any changes.

@rocketraman

in many cases developers have no ability to affect layout at all — all applications across the enterprise use the same layout in order to enforce logging consistency

I absolutely understand this. All the Spring Boot and Spring Cloud Config support in Log4j was added by me to support my employer. We have a common logging configuration stored in Spring Cloud Config that all applications share. However, every application can still provide an override that extends capabilities and we take advantage of that.

Now this is sort of interesting

class ContextLogger {
  private final Map<String, String> context;
  private final Logger delegate;

  void log(...) {
    setContext(context)
    try {
      delegate.log(...)
    } finally {
      resetContext()
    }
  }
}

Yes, one could modify ResourceLogger to have

    public void logMessage(final String fqcn, final Level level, final Marker marker, final Message message, final Throwable t) {
        ScopedContext.runWhere(this.supplier.get(), () -> logger.logMessage(fqcn, level, marker, message, t));
    }

However, this has a small side effect that any logging calls that occur from objects passed into the logging call or logging that occurs from Appenders and other components while processing the logevent will also have this context data, despite them using a different logger.

If that makes sense I would actually consider this.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the ResourceLogger proposal from this PR, but I would drop the ScopedContext part for a simple reason: there are other specialized APIs (OpenTelemetry and context-propagation) to deal with context propagation that are much more advanced. If we want to propose a light-weight alternative for them, we should do it in another artifact (e.g. log4j-context), but I am not sure if we have to propose any radically new approach to the problem.

Regarding ResourceLoader, Log4j Core 3.x doesn't reuse neither the AbstractLogger utility class from 2.x nor its message factories (it is Recycler-based). Could we implement an interface-based solution so that we can have differences between implementations.

Comment on lines +31 to +51
/**
* Context that can be used for data to be logged in a block of code.
*
* While this is influenced by ScopedValues from Java 21 it does not share the same API. While it can perform a
* similar function as a set of ScopedValues it is really meant to allow a block of code to include a set of keys and
* values in all the log events within that block. The underlying implementation must provide support for
* logging the ScopedContext for that to happen.
*
* The ScopedContext will not be bound to the current thread until either a run or call method is invoked. The
* contexts are nested so creating and running or calling via a second ScopedContext will result in the first
* ScopedContext being hidden until the call is returned. Thus the values from the first ScopedContext need to
* be added to the second to be included.
*
* The ScopedContext can be passed to child threads by including the ExecutorService to be used to manage the
* run or call methods. The caller should interact with the ExecutorService as if they were submitting their
* run or call methods directly to it. The ScopedContext performs no error handling other than to ensure the
* ThreadContext and ScopedContext are cleaned up from the executed Thread.
*
* @since 2.24.0
*/
public class ScopedContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This part seems independent from the ResourceLogger]

While technically this class is very well written and could benefit some users, I am more and more convinced that we shouldn't offer this kind of context-propagation API.

ThreadLocal was a big hit and allowed users to store and retrieve key/value pairs for logging purposes (and not only 😉), but nowadays there are APIs specialized in this kind of things. Except the context-propagation project I cited on the mailing-list, there is an entire Context Propagation service from OpenTelemetry.

If we were to offer users ScopedContext, I am afraid that sooner or later they would need to refactor it to use another API instead. The OTel Context class offers many more wrappers than this class.

In my opinion, what we should do instead is to help application and libraries that instrumented their code with ThreadContext to migrate to a specialized context-propagation API. For example we could submit to [open-telemetry/opentelemetry-instrumentation-java] a ThreadContextMap implementation that uses the Bagage API

Copy link
Member Author

@rgoers rgoers Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppkarwasz I have no plans to merge this PR without ScopedContext. I am quite sure I will use it in my environment far more frequently than I will ever use ResourceLogger. I can easily envision Service 1 creating a ScopedContext that gets propogated to Service 2, which then adds its own ScopedContext data. This would be extremely valuable in the logs.

As you know, I am also against the API having dependencies on anything but the JDK. I am not inclined to have the API be dependent on someone elses context propagation API. Looking at the Context object I see that it mostly adds different kinds of functions that can be called. I am not convinced that is necessary but if it is they can always be added later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Ralph here. We're at the bottom of the stack. While we can bridge our APIs over to other libraries, those don't belong in our API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgoers,

As you know, I am also against the API having dependencies on anything but the JDK. I am not inclined to have the API be dependent on someone elses context propagation API.

I am not suggesting to add an external context propagation API dependency to log4j-api, but to delegate context propagation to higher levels of the stack. For example let's look at the first example of "Observability with Spring Boot 3":

// Create an Observation and observe your code!
Observation.createNotStarted("user.name", registry)
        .contextualName("getting-user-name")
        .lowCardinalityKeyValue("userType", "userType1") // let's assume that you can have 3 user types
        .highCardinalityKeyValue("userId", "1234") // let's assume that this is an arbitrary number
        .observe(() -> log.info("Hello")); // this is a shortcut for starting an observation, opening a scope,

The userId value will be present in the spans and metrics, but will it be present in the context data of the log event? If it is not present, how can I add it? I don't think that wrapping the lambda using ScopedContext is a practical solution.

Looking at the Context object I see that it mostly adds different kinds of functions that can be called. I am not convinced that is necessary but if it is they can always be added later.

The interesting part for me is not the amount of wrapper methods it provides, but the promise that it will propagate everywhere using any kind of technology (Netty, HTTP, Spring Reactor, …). They already integrate with Log4j Core (see log4j-context-data-2.17), but even if they didn't, we could create a Maven module that integrates OpenTelemetry with Log4j API.

Copy link
Member Author

@rgoers rgoers Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppkarwasz
Regarding your Observation object, just please no. This is way, way too complicated and it is serving a completely different concept than just tracking keyed items you want included in the logs (or possibly quick access to in your application). The ScopedContext can be propogated anywhere as well. We have a utility class at work that propagates the ThreadContext via RestTemplate, Kafka, and Amqp. Making it support ScopedContext is easy as it just needs to do what ScopedContextDataProvider does to get hold of the context entries. Actually, a version for REST is in Log4j-Audit. I don't believe Log4j should provide the integrations to these technologies at least as part of core or the api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Log4j should provide the integrations to these technologies at least as part of core or the api.

On this we agree, but do we need to force those projects to write yet another integration?

The ScopedContext can be propogated anywhere as well. We have a utility class at work that propagates the ThreadContext via RestTemplate, Kafka, and Amqp. Making it support ScopedContext is easy as it just needs to do what ScopedContextDataProvider does to get hold of the context entries.

Every context propagation is easy, but it is annoying. At my previous job we had to propagate Spring's request scope, our own scope, Vaadin's context and so on. I started resenting colleagues that used static context accessors, since their code was usually tested on the servlet working thread and we only noticed failures (NPEs), when accessed from an asynchronous thread.

Comment on lines +35 to +37
public String getFormattedMessage() {
return baseMessage.getFormattedMessage();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, Open Telemetry uses MapMessage#getFormat() and MapMessage#get("message") to extract the body of a map message (cf. LogEventMapper). Therefore I would override the getFormat() to return baseMessage.getFormattedMessage().

Regarding how getFormattedMessage should work, I am not sure what a user would expect. My guess is that it should be something like:

key1="value" key2="value" Formatted base message

i.e. the "FULL" message. Note that many (even third-party) layouts know about MultiformatMessage, so they don't call this method.

Remark: could we create an interface (e.g. StructuredMessage, AttributeMessage or something similar) that codifies how the current sub-classes of MapMessage work? E.g.:

public interface AttributeMessage<V> extends Message {
    
    @Override
    default String getFormat() {
        Message message = getMessage();
        return message != null ? message.getFormattedMessage() : "";
    }
    
    @Override
    String getFormattedMessage();
    
    @Nullable Message getMessage();
    
    Map<String, V> getData();
}

Also it would be nice to add some well-known constants for MultiformatMessage. Integrators do use strings like JSON in practice, but the specification does not prevent the creation of message implementations that return some garbage instead of JSON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppkarwasz
I would suggest that Open Telemetry would be better served by adopting ParameterizedMapMessage than treating the message key as special. I admit, I have done the same thing in the past which is what motivated me to create this Message class as I have always found that irritating. You are correct the getFormat() should be overridden but it should be baseMessage.getFormat().

The purpose of ParameterizedMapMessage, as I have stated several times, is to have it format %m EXACTLY how ParameterizedMessage would but still have it be a MapMessage so other Lookups, Filters, and Layouts can extract the structured data from the message. That is it. No fancy tricks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that Open Telemetry would be better served by adopting ParameterizedMapMessage than treating the message key as special.

Due to a "happens before" relationship they didn't have the chance to do it. 😁

The purpose of ParameterizedMapMessage, as I have stated several times, is to have it format %m EXACTLY how ParameterizedMessage would but still have it be a MapMessage so other Lookups, Filters, and Layouts can extract the structured data from the message. That is it. No fancy tricks.

Sure, extending MapMessage is a nice trick to maintain some sort of backward compatibility. However I would like to have a well documented Java interface that ParameterizedMapMessage will implement. In time we can switch all the instanceof MapMessage to that class.

Copy link
Member Author

@rgoers rgoers Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppkarwasz

Due to a "happens before" relationship they didn't have the chance to do it. 😁

They will be able to after this is merged.

However I would like to have a well documented Java interface that ParameterizedMapMessage will implement. In time we can switch all the instanceof MapMessage to that class

Umm. No. There are multiple reasons to use a MapMessage of various kinds. Consider StructuredDataMessage. That supports RFC5414 which specifies how things are supposed to behave. I use MapMessages to create data for dashboards in Kibana. In that case you don't want or need the ParameterizedMessage support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFormattedMessage can always return the empty string, if there is not message attached to the map.

/**
* Constructs a ResourceLogger.
*/
public static final class ResourceLoggerBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer to just have a ResourceLoggerBuilder (or LoggerBuilder) interface and access it through a new Logger#newDetachedLoggerBuilder method.

Creating "detached" loggers might be used for other purposes than adding key/value pairs to a message. For example someone might want to use a different MessageFactory.

@ppkarwasz
Copy link
Contributor

@rgoers,

It is worth repeating that resource Loggers should NEVER be added to the LoggerRegistry. Resource Loggers need to have the same lifetime as the resource. Loggers in the registry never go away and so would result in a memory leak. Note that them not being in the registry isn't really a problem as the underlying Logger they use is. That Logger will be reconfigured when needed.

Good point, so maybe LogManager#getLogger(String name, MessageFactory factory) could be reimplemented to:

  1. create a logger logger with the default message factory,
  2. if factory is not null, return a wrapper of logger.

IMHO, the ParameterizedMapMessage class that you introduce is already half way to meet @rocketraman's expectations:

  • by using a special LogEventFactory the map data can be added to the context map and the baseMessage can be used as message. Something similar already happens with TimestampMessages that add their own timestamp to the LogEvent,
  • certainly the specification of ContextDataInjector#injectContextData must change: currently it already receives the current context data of the event as reusable parameter, but most implementation ignore it.

Sure, you could do this, but I am not sure it is a good idea. There are other uses of MapMessage where you do not want the data merged into the context data. I would be hesitant to always merge it without some indication from the user since there are a lot of cases where that is not what you want.

Sure, the default event factory should not merge the data, but we can propose an alternative one that does merge it. The crucial point for this to work is that ContextDataInjector implementations should not ignore the reusable parameter they receive, but always test if it is empty or not and act accordingly. I don't think the an isEmpty() test will have any impact on the performance.

Did you just answer your own question? Of course there is a difference. A Message is really just a container for the format String and its parameters. The fact that you can make more complex messages like a MapMessage is a perk. But the main difference is that the user really controls the contents of the Message while Log4j controls the LogEvent.

Sure, there is a semantic difference, but a MutableLogEvent with all the "message" fields filled and all the "event" fields empty, looks pretty much as a Message to me. As I previously mentioned, for Log4j 3.x I would like to create a factory that implements both MessageFactory and LogEventFactory. The createMessage methods will pull MutableLogEvents from a recycler and fill the "message" fields, while the createEvent methods will fill the remaining fields.

The bottom line here is Piotr, I am not sure what to do with your comment. You haven't actually asked for any changes.

I needed some time to digest your proposed changes, 😉. While it doesn't touch a lot files, the amount of API changes in this PR has not been seen since the introduction of LogBuilder. Having done some questionable "improvements" to the o.a.l.l package in the past (e.g. the BridgeAware interface that I should have put in o.a.l.l.spi), I had to do some research to decide which changes are required and which we'd better drop.

@ppkarwasz
Copy link
Contributor

@rgoers,

Could you review my #2381 PR that introduces another MessageFactory to deal with Logback's throwable semantics? Maybe we should really consider creating a log4j-message artifact with a log of message-related features and leave in log4j-api just the strict necessary?

@rgoers
Copy link
Member Author

rgoers commented Mar 29, 2024

@rocketraman I have modified the PR so that ResourceLogger uses ScopedContext and behaves the way you requested.
@ppkarwasz I have no idea why the PR system lets me reply to some messages and not others. It means I have to quote stuff which is annoying but whatever.
With regards to your last comment. I have changed ResourceLogger to no longer add the map data to the Message so I am assuming most of that is no longer relevant.

As things stand, I am quite happy where this is and personally would start using it (well ScopedContext anyway) right away.

Copy link
Member

@jvz jvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea in general so far. Left some comments.

public class ResourceLoggerTest {
@BeforeAll
public static void beforeAll() {
System.setProperty("log4j2.loggerContextFactory", TestLoggerContextFactory.class.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: good idea to clear this property afterwards.

* @param executorService the ExecutorService to dispatch the work.
* @param op the Callable to call.
*/
public static <R> Future<R> callWhere(String key, Object obj, ExecutorService executorService, Callable<R> op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got Java 8 at minimum at our fingertips. Let's make sure to support CompletableFuture as well. If possible, it'd be great to support Flow.Publisher from Java 9, too, though context propagation with reactive streams tends to use a non-standard API such as Reactor's context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvz Isn't CompletableFuture a Future? How is it not supported now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It implements Future, yes, but it's also a CompletionStage which is the part of the API I'm more interested in.

@ppkarwasz
Copy link
Contributor

@rocketraman I have modified the PR so that ResourceLogger uses ScopedContext and behaves the way you requested. @ppkarwasz I have no idea why the PR system lets me reply to some messages and not others.

Some of my messages are part of a "review", other are simple comments.

@rgoers
Copy link
Member Author

rgoers commented Mar 30, 2024

@ppkarwasz
You have flagged this as "Changes Requested". It isn't really clear what you have requested. If it is to remove ScopedContext that isn't going to happen so we will have to work that out. If it is something else please let me know.

@ppkarwasz
Copy link
Contributor

@ppkarwasz You have flagged this as "Changes Requested". It isn't really clear what you have requested. If it is to remove ScopedContext that isn't going to happen so we will have to work that out. If it is something else please let me know.

Yes, in general I am +10 on ParameterizedMapMessage and -1 on ScopedContext. What I would like to change in this PR mostly falls into two categories:

  • size down the impact on the public API, now that we know that we can not make breaking changes in the API for a very long time,
  • decouple the contract of the new API from its implementation, by using well-documented interfaces.

In particular:

  1. Is there something wrong with adding new methods to CloseableThreadContext instead of creating a new ScopedContext?

  2. You introduce another ThreadLocal containing context data: all the libraries that read context data for metrics or traces will need to be updated. Could we reuse ThreadContextMap? log4j-to-slf4j will also ignore the scoped context.

  3. Multiple ContextDataProviders currently don't work very well: you will not be able to use your scoped data in a ${ctx:…} lookup (see ContextDataInjector ignores custom ContextDataProvider #2331),

  4. Can we replace the static factory methods in ResourceLogger with new instance methods of Logger?

  5. ResourceLoggerBuilder could be an interface: in log4j-to-slf4j I might want to implement it differently.

  6. ResourceLogger should not appear in the signature of ResourceLoggerBuilder#build(): from a user perspective, it is nothing more than a Logger.

  7. ParameterizedMapMessage should have a matching interface:

    • we might want to implement a reusable version of that interface,
    • we want to hide from users all the irrelevant MapMessage methods: MapMessage implements MultiFormatStringBuilderFormattable and is mutable,
    • the interface should precisely describe what is the effect a user will get if he uses it. The Javadoc of Message#getFormat is a counter-example of a precise description. We might as well have written "Undefined behavior".

@rgoers
Copy link
Member Author

rgoers commented Mar 30, 2024

@ppkarwasz
OK. Since you are -1 on ScopedContext I will close this PR.

Is there something wrong with adding new methods to CloseableThreadContext instead of creating a new ScopedContext?

Yes. IMO CloseableThreadContext is just as bad as what SLF4J did. It requires the application to "get it right"all the time. It is just as impossible to leave a ScopedContext without leaving the context as it was when it was entered as it is to leave a synchronizd block holding a lock. Trying to merge those two would result in a broken mess.

You introduce another ThreadLocal containing context data: all the libraries that read context data for metrics or traces will need to be updated. Could we reuse ThreadContextMap? log4j-to-slf4j will also ignore the scoped context.

  1. We cannot reuse ThreadContextMap without breaking it. It only allows String values. Relaxing that allows users to store objects, which we know causes problems. Due to the way ScopedContext behaves storing objects is not a problem.
  2. Well, since SLF4J has no ScopedContext that shouldn't be surprising. It would be up to SLF4J to support it just as we have had to support some of the crappy stuff they have added.

Multiple ContextDataProviders currently don't work very well: you will not be able to use your scoped data in a ${ctx:…} lookup (see #2331),

That isn't a bug. The ContextDataInjector injects data into LogEvents. $ctx will work in that case if it is operating on a LogEvent. If there is no log event then $ctx operates on the ThreadContext. Nowhere do we document that a ContextDataProvider adds data to the ThreadContext.

Can we replace the static factory methods in ResourceLogger with new instance methods of Logger?

Why? Then I am forced to create a Logger to create a ResourceLogger. I can't imagine why a user would really want both. If you want to log stuff happening within the scope of a Class then everything should use the one Logger. This just seems ugly.

ResourceLoggerBuilder could be an interface: in log4j-to-slf4j I might want to implement it differently.

I simply don't udnerstand why this has anything to do with log4j-slf4j. We aren't here to make all Log4j-API functionality work in SLF4J.

ResourceLogger should not appear in the signature of ResourceLoggerBuilder#build(): from a user perspective, it is nothing more than a Logger.

Yeah - this is a minor nit I noticed as well. It would have gotten fixed along wtih the System property removal Matt mentioned.

ParameterizedMapMessage should have a matching interface:

Sure, I suppose. I'm ambivalent to this one. It would mean changing a LOT of code that has behavior tied to MapMessages in not so subltle ways but I suppose it could be done. I wouldn't want it as part of this PR as it would be too big. After the change to use the ScopedContext ParameterizedMapMessage wasn't necessary anyway.

@rgoers rgoers closed this Mar 30, 2024
@rgoers rgoers deleted the ScopedContext branch March 30, 2024 11:28
@ppkarwasz
Copy link
Contributor

ppkarwasz commented Mar 30, 2024

OK. Since you are -1 on ScopedContext I will close this PR.

I would be OK with an implementation based on ThreadContextMap that does not introduce new ThreadLocals.

I would also be OK with a log4j-context artifact that contains the current implementation of ScopedContext.

@rgoers
Copy link
Member Author

rgoers commented Mar 30, 2024

@ppkarwasz
As I said in my edited comment above, that can't be done without exposing ThreadContextMap to breakage. The ThreadLocal for ThreadContextMap contains a Map<String, String>. ScopedContext requires the value be a queue. IMO separation of these is also a good thing. With ScopedContext the ThreadLocal for a thread will ALWAYS be empty when the work on the Thread is complete. That isn't guaranteed with a ThreadContextMap item. Why does having a separate ThreadLocal concern you?

Also - when you say "based on ThreadContextMap"I would have to respond with "Which one"? It looks to me that there are something like 8 different implementations in Log4j now. (This is quite out of hand IMO).

A log4j-context artifact in a separate repo or a submodule? This is pretty tiny on its own. It is a grand total of 3 classes - ScopedContext, ScopedContextAnchor, and ScopedContextDataProvider. Note that ScopedContext and ScopedContextAnchor don't require log4j-core while the ContextDataProvider does. I am not sure if that the JPMS module dependency could be optional so that the SLF4J bridge could eventually do something with it.
Also, ResourceLogger would have to move there as well since it is based on a ScopedContext.

@ppkarwasz
Copy link
Contributor

@rgoers,

In your edited comment you make a lot of good points. Let us discuss in a video call after the holidays.

@ppkarwasz
Copy link
Contributor

@rgoers,

  1. We cannot reuse ThreadContextMap without breaking it. It only allows String values. Relaxing that allows users to store objects, which we know causes problems. Due to the way ScopedContext behaves storing objects is not a problem.

I think that we can reuse ThreadContextMap. Since Log4j API 2.8 there is a (very dangerous) ObjectThreadContextMap interface for which we have one implementation.

While I totally agree that exposing the functionality of that interface through ThreadContext would be simply insane, exposing it through ScopedContext is perfectly safe: your implementation guarantees that all Object values that a user might insert into the thread context will be removed when your run method ends.

Sure, implementing ScopedContext using ThreadContextMap is a little bit harder, since the former is Deque-based and the latter has a single value, but as I remarked in this comment on #2330 we might need to add two static methods somewhere in o.a.l.l.spi that allow to rapidly take a snapshot of a context and restore it:

    /**
     * Saves the current context map data.
     * <p>
     *     The returned object is not affected by later changes to the context map.
     * </p>
     * @return An opaque thread-safe version of the internal context map representation.
     * @see #restoreContextMap
     * @since 2.24
     */
    static Object saveContextMap();

    /**
     * Restores the context map data, from a saved version.
     * @param contextMap An opaque thread-safe version of the internal context map representation.
     * @see #saveContextMap
     * @since 2.24
     */
    static void restoreContextMap(Object contextMap);

These methods might be useful in CloseableThreadContext and for integrators to more easily transfer the thread context from one thread to another, without converting it from and to a Map<String, String>.

@rgoers
Copy link
Member Author

rgoers commented Mar 31, 2024

@ppkarwasz
I really don't see the point of using ObjectThreadContextMap.

  1. In our discussion yesterday we agreed ThreadContext (and presumably the ThreadContextMaps) should be deprecated. That would include ObjectThreadContextMap.
  2. The only implementations of that interface declare:
    protected final ThreadLocal<StringMap> localMap;

which isn't particularly helpful.

  1. Even if I could find a ThreadLocal to reuse I still don't understand why you are so reticent about adding a new one. After all, if ContextScope isn't being used then the ThreadLocal won't be instantiated. Likewise, one would imagine that if ContextScope is being used then the ThreadContext won't be.
  2. I am not sure what it would mean to "Take a snapshot" of a ScopedContext or why one would want to do it. I suppose I could imagine saving the topmost map as JSON for transport, but since it can contain objects it would almost surely have to serialize only the redered String variation of the objects in the map as it would be impossible to deserialized the map without know ing what is in it. Additionally, taking a snapshot would almost certainly NOT want to copy the whole Deque but just the top-most entry. Unless, of course, you are "saving" the whole application and its context, but I cannot even imagine why one would want to do that object by object.

@ppkarwasz
Copy link
Contributor

  1. In our discussion yesterday we agreed ThreadContext (and presumably the ThreadContextMaps) should be deprecated. That would include ObjectThreadContextMap.

I would love that, but after looking at our SLF4J integrations I am not sure any more we can do that.

3. Even if I could find a `ThreadLocal` to reuse I still don't understand why you are so reticent about adding a new one. After all, if ContextScope isn't being used then the ThreadLocal won't be instantiated. Likewise, one would imagine that if ContextScope is being used then the ThreadContext won't be.

I am reticent about adding a new thread local, because propagating the ThreadContextMap is well supported by many tools, while the new ThreadLocal in ScopedContext will require many new integrations.

As far as I remember JBoss LogManager also has an MDC that can have object values. In this case reusing ThreadContextMap will allow you to have an out-of-the-box integration.

4. I am not sure what it would mean to "Take a snapshot" of a ScopedContext or why one would want to do it.

By "taking a snapshot" I mean retrieving the Foo from the ThreadLocal<Foo> if Foo is immutable or make a copy of Foo otherwise).

@rgoers
Copy link
Member Author

rgoers commented Mar 31, 2024

OK. I think I understand your concerns WRT tooling and I will do what I can to address that. FWIW, I don't believe using the ThreadContext's ThreadLocal would help in that at all.

As for the snampshot, again I don't have any idea why you would want the whole stack of Maps for a ScopedContext. Usually you would only want the current map.

@rgoers
Copy link
Member Author

rgoers commented Apr 3, 2024

I have updated the branch with some significant changes:

  1. ContextDataProvider is now moved to the API. This allows all ContextDataProviders to be used in Lookups, Filters, etc. For example, the $ctx lookup will now include fields from both the ScopedContext, ThreadContext and custom providers.
  2. ContextData has been introduced as a way to access the unified context data. So ContextData.getValue(key) will find the value for a key regardless of which ContextDataProvider it was specified in.
  3. All access to the Contexts is garbage free. Multiple versions of ThreadContextMap are no longer necessary.
  4. ThreadContextDataInjector and related classes are deprecated in 2.24.0 and scheduled for removal in 3.0.0. This will eliminate a lot of noise.

@rgoers
Copy link
Member Author

rgoers commented Apr 3, 2024

Rats. I can't reopen the PR. Guess I have to create a new one again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants