Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Add user facing Logging API and Benchmarks #2094

Merged
merged 14 commits into from
Apr 23, 2023

Conversation

ThomsonTan
Copy link
Contributor

@ThomsonTan ThomsonTan commented Apr 13, 2023

Fixes #2093

Changes

This PR adds a new set of user facing logging API for application developer to instrument their code. This new set of API is designed for production environment with large volume of code instrumented while maintaining high throughput.

Here are the main design considerations:

  • Add optional event Id to the log inside Logger object because categorization is very important for large volume of logs data. It helps both in doing runtime filtering in SDK/exporters and in backend grouping/querying.
  • Added Enabled predicate on Logger to check if log should be sent to SDK for processing. It is assumed that most instrumented logs will not be turned on so it is fast-path for such code with minimum runtime performance overhead.
  • Multi-thread safety is built into the design and the same logger object can be used in multiple threads/cores.
  • Benchmark is added to measure the runtime cost of API calls. It shows the fast path check only takes a few machine cycles (~5).

This is only for API change, and SDK change will be added later.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@ThomsonTan ThomsonTan requested a review from a team April 13, 2023 02:18
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #2094 (d6e96af) into main (22d0448) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2094   +/-   ##
=======================================
  Coverage   87.19%   87.19%           
=======================================
  Files         166      166           
  Lines        4784     4784           
=======================================
  Hits         4171     4171           
  Misses        613      613           

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a point in the design that I do not understand.

Method Logger::SetMinimumSeverity() is protected, so it can not be invoked by the instrumented application, this is not part of the api.

It sounds like minimum_severity_, a class member, is part of the implementation already.

Why making this code visible in the API, instead of implementing the minimum_severity_ member in the SDK, in a sub class of Logger ?

Instead of using atomic macros in the API, the SDK could use std::atomic instead, and this will avoid exposing all the per platform and compiler code in the API itself.

Please clarify the intent here.

@ThomsonTan
Copy link
Contributor Author

There is a point in the design that I do not understand.

Method Logger::SetMinimumSeverity() is protected, so it can not be invoked by the instrumented application, this is not part of the api.

It sounds like minimum_severity_, a class member, is part of the implementation already.

Why making this code visible in the API, instead of implementing the minimum_severity_ member in the SDK, in a sub class of Logger ?

Instead of using atomic macros in the API, the SDK could use std::atomic instead, and this will avoid exposing all the per platform and compiler code in the API itself.

Please clarify the intent here.

Yes, SetMinimumSeverity() is not part of the user-facing API. It is intended to SDK or any one which implements this API to use internally.

And you are right that minimum_severity_ is part of implementation instead of API, but it is added here for getting the best performance on the common path (no virtual function call is needed if the logger is not enabled which I assume is the common case for most components). So in brief, providing implementation of the minimum severity check in this abstract logger is for eliminating the virtual dispatch for common cases/scenarios so the runtime cost of instrumentation will be minimum.

It is also correct that if minimum_severity_ is moved to SDK, the atomic support will be much easier, but calling into SDK causes performance overhead which I want to avoid here.

Please let me know if anything I missed here.


protected:
// TODO: discuss with community about naming for internal methods.
virtual bool EnabledImplementation(Severity /*severity*/,
Copy link
Member

@reyang reyang Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a function which child classes can derive from, I wonder if it makes sense to use a function pointer (similar to the severity level field which can be set by the implementations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume this is for internal implementation so doesn't block the PR.

The virtual function here looks a restricted function pointer which is stored in the virtual table. If we store the pointer here like class Logger, it seems more flexible like not SDK but any other library can provide a function to do real Enabled() work. It might also be more risky if the function pointer is compromised then it can invoke anything as code. The virtual function is safer in this regard as it cannot be patched at runtime, and it is usually in the protected path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually consider the patch as a bless than a curse. (and I don't see why virtual functions cannot be patched)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual function tables are in read only area of memory in modern operation system which cannot be patched normally.

But I think the idea still applies to provider a mechanism for any related part to register its own Enabled() check to the API Logger. Opened #2105 to track this because it is NOT API breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the name, EnabledImplementation, discussion is deferred to the SDK patch if I understand correctly.

Using a function callback in a class member instead of a virtual method is asking for too much trouble in my opinion:

  • what if it changes, should the pointer itself be looked up using atomics, similar to the severity ?
  • Even after an atomic read, is the pointer still valid anyway (for all we know, it can be in a shared library), so is it safe to invoke ?

In short, the current proposal (virtual method) looks good to me, apart from the name to discuss later.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments / questions

api/test/logs/logger_benchmark.cc Outdated Show resolved Hide resolved
Comment on lines +310 to +315
inline void Trace(const EventId &event_id,
nostd::string_view format,
const common::KeyValueIterable &attributes) noexcept
{
this->Log(Severity::kTrace, event_id, format, attributes);
}
Copy link
Member

@marcalff marcalff Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the intended usage for this helper:

Is the caller supposed to check Enabled for each call, as in:

if (logger->Enabled(xxx)) {
  logger->Trace(xxx);
}

or is Trace() responsible to check the severity flag itself, as in:

logger->Trace(xxx);

The benchmarks shows both usage, is this code writing un wanted traces in the second case ?
The check for Enabled() seems missing from the code path.

Beside, if the whole point of Logger::minimum_severity_ is to check the flags in the API before making a virtual function call, then why making calls here to virtual functions Log() and EmitLogRecord() ?

Suggested changes:

  • Call Enabled() in every user facing api.
  • Do not call a user facing api like Log() from another user facing api like Trace(), call EmitLogRecord directly, so the Enabled() flags are not checked multiple times.

For example:

inline void Trace(xxx) {
  if (Enabled(xxx)) {
    EmitLogRecord(xxx);
  }
}

Edit: If EmitLogRecord() is also considered user facing (it is in the spec), then:

inline void Trace(xxx) {
  if (Enabled(xxx)) {
    DoEmitLogRecord(xxx);
  }
}

where DoEmitLogRecord() is a protected virtual method.

This will add clarity:

  • user facing apis are public, inline, not virtual, and test if Enabled() up front
  • user facing api delegate to not user facing, after the Enabled() test.
  • not user facing apis are protected, virtual, and do not test Enabled().

Copy link
Contributor Author

@ThomsonTan ThomsonTan Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @marcalff .

Call Enabled() in every user facing api.
Enabled() should be called by the user's code instead of from our all our wrappers which is intentional. The idea is to avoid calculating any parameters which is only for logging. In the simple case, common::MakeAttributes({"key1", “value1}, {"key2", "value2"}} takes significant extra time than the Enabled() check. And there could be many other heavy query result which passed as parameters to query. Calling Enabled() in user's code can eliminate such heavy query from the first place.

logger->Trace("some message template", common::MakeAttributes({"some_value_takes_time_calculate", long_time_query()}));

If we are not going to call Enabled() in user facing API, then inline the call of DoLogRecord seems unnecessary for all the wrappers then? I don't have strong preference as the name implied they are just wrappers. But I can also revisit this part in the SDK support because changing the implementation will not break the user facing API.

Please let me know if anything is missed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to have a discuss.
We have a log macro like this before in our project.

#define WLOGDEBUG(logger, ...) \
  if(logger.check_level(log_level::DEBUG)) {
    logger->log(log_level::DEBUG, __VA_ARGS__);
  }

And when we use WLOGDEBUG(logger, "Debug message: {}", protobuf_message.DebugString()); and if the debug log is not enabled, protobuf_message.DebugString() will not be called and it's a high cost function.

Is there any way we can achieve a similar effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can define similar macro to help the user reduce boilerplate code, but it can avoid the logging cost. Based on the logging statement you shared, the the variable protobuf_message might be only need for logging, and it could do heavy initialization (like map or load string resources). This happens in the definition of protobuf_message which cannot be be optimized here.

  WLOGDEBUG(logger, "Debug message: {}", protobuf_message.DebugString());

While providing macros could help, the pointer in our API is make Enabled check explicit to the user, so they are aware of this and encouraged to do the optimization to archive high performance logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue the discussion about Enabled() in the next patch, for the SDK part.

The fact that the caller MAY use Enabled() to optimize paths that evaluate complex arguments is good.

Still we can't force that the caller MUST use Enabled() all the time, so the implementation will need to check it again at some point.

@marcalff
Copy link
Member

Yes, SetMinimumSeverity() is not part of the user-facing API. It is intended to SDK or any one which implements this API to use internally.

Thanks for the clarifications.

I can see how multiple threads may use the same Logger instance, if GetLogger() returns a shared object used concurrently.

What is missing here is details about when SetMinimumSeverity() is called.

It is really the case that SetMinimumSeverity() can be called concurrently on a logger already created ? How ?

Could Logger::minimum_severity_ be set in the Logger constructor instead, assuming it is constant for a given logger instance.

This will remove all the atomic code entirely.

{
id_ = id;
name_ = nostd::unique_ptr<char[]>{new char[name.length() + 1]};
std::copy(name.begin(), name.end(), name_.get());
Copy link
Member

@lalitb lalitb Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes against the standard practice of not doing any memory copy at the API/SDK surface. The recordable interface design ensures that we can directly serialize the data at exporter, without creating any intermediate copies at the API/SDK level. One solution to avoid this copy operation can be to get rid of EventId class, and pass the event-id (as int64_t) and event-name (as notstd::string_view) directly to Log methods. Also we have only used primitive, or non-owningnostd types as API arguments. Even though we can debate that passing reference/pointer to a user created class mayn't break the ABI, in general good to avoid if we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes against the standard practice of not doing any memory copy at the API/SDK surface.

No memory will be allocated in our logging APIs. The EventId class is supposed to be created by the user with sort of global lifetime, it so copies the string instead of using any string_view on existing string because it then requires the user to manage the original string object explicitly. The allocation and free of such object is not in the scope of logging API as it only takes const reference on such object.

@ThomsonTan
Copy link
Contributor Author

Yes, SetMinimumSeverity() is not part of the user-facing API. It is intended to SDK or any one which implements this API to use internally.

Thanks for the clarifications.

I can see how multiple threads may use the same Logger instance, if GetLogger() returns a shared object used concurrently.

What is missing here is details about when SetMinimumSeverity() is called.

It is really the case that SetMinimumSeverity() can be called concurrently on a logger already created ? How ?

Could Logger::minimum_severity_ be set in the Logger constructor instead, assuming it is constant for a given logger instance.

This will remove all the atomic code entirely.

Logger::minimum_severity_ will only be available for SDK to config (SDK can choose how to present this config to the user. I am working on this). It is not supposed to be a constant after creation of logger object and SDK can configure it as runtime, then atomic r/w will be required for this.

@owent
Copy link
Member

owent commented Apr 20, 2023

Just wondering, do you think we can add a new class to contains these functions?The name event seems different from the event in specification .

@ThomsonTan
Copy link
Contributor Author

Just wondering, do you think we can add a new class to contains these functions? The name event seems different from the event in specification .

The OTel event API is still in experiment, and like the log bridge API, I think it is also not user-facing API. And as per the below spec issue, the log bridge API is reaching stable state which we can build our user-facing API upon.

open-telemetry/opentelemetry-specification#2911

With layering of the architecture, our user-facing API is in the upper layer, so ideally we should consider the requirement of users to write logging code, even we've already review the current bridge API spec and made sure it is sufficient to implement our API. The bridge API is in the middle layer supporting the user facing API, so even in future the event API reach stable, we can just update our SDK like to send logs with event Id to that event API, which will help user logging user to upgrade seamlessly.

Finally, I think provide 2 sets of similar API, for logs and events, to user will cause more confuse than the benefit it gives, and it hinders user to switch between logs and events if do so which adds unnecessary complexity (like if something starts as log, as evolving happens, existing it to log makes sense, but the 2 API and separate class approach will require the user to acquire a new EventLogger and call some totally different methods. But with our current approach, the only change will be adding event_id to the parameter list.

Let me know if I missed anything here.


public:
int64_t id_;
nostd::unique_ptr<char[]> name_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use std::string name_; here? The usage of this variable above seems just like std::string, and do not add a tail \0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EventId can be read from both user and SDK code, which requires it to be ABI compatible. std::string cannot guarantee such ABI compatibility requirement across the boundary between user code/SDK so cannot be used. I'd expect there is nostd::string but we don't have it, then switch to primitive char pointer wrapped in nostd::unique_ptr. Let me know if this addresses the concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explain, on the other hand, should we set name_[name.length()] = 0; here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. Set the last element of name_ as 0.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications.

Looks ok for now, so approving the PR.

Some items here in fact depends on the intended implementation from the SDK,
so I expect we will be able to revisit particular points later, when not affecting the usage of logger but only the implementation from the logger subclass.


protected:
// TODO: discuss with community about naming for internal methods.
virtual bool EnabledImplementation(Severity /*severity*/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the name, EnabledImplementation, discussion is deferred to the SDK patch if I understand correctly.

Using a function callback in a class member instead of a virtual method is asking for too much trouble in my opinion:

  • what if it changes, should the pointer itself be looked up using atomics, similar to the severity ?
  • Even after an atomic read, is the pointer still valid anyway (for all we know, it can be in a shared library), so is it safe to invoke ?

In short, the current proposal (virtual method) looks good to me, apart from the name to discuss later.

Comment on lines +287 to +291
virtual void Log(Severity severity,
int64_t event_id,
nostd::string_view format,
const common::KeyValueIterable &attributes) noexcept
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need each of these helpers to be virtual functions ?

A simple inline function seem sufficient.

Comment on lines +310 to +315
inline void Trace(const EventId &event_id,
nostd::string_view format,
const common::KeyValueIterable &attributes) noexcept
{
this->Log(Severity::kTrace, event_id, format, attributes);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue the discussion about Enabled() in the next patch, for the SDK part.

The fact that the caller MAY use Enabled() to optimize paths that evaluate complex arguments is good.

Still we can't force that the caller MUST use Enabled() all the time, so the implementation will need to check it again at some point.

Copy link
Member

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks.

@ThomsonTan ThomsonTan merged commit a39e8b5 into open-telemetry:main Apr 23, 2023
@ThomsonTan ThomsonTan deleted the LogAPIBenchmark_PR branch April 23, 2023 14:52
@marcalff marcalff changed the title Add user facing Logging API and Benchmarks [API] Add user facing Logging API and Benchmarks May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add user facing logging API for OpenTelemetry C++
5 participants