Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Support for external observability providers - Logging #1500

Closed
2 tasks done
erikayao93 opened this issue Jun 13, 2023 · 12 comments · Fixed by #1511
Closed
2 tasks done

RFC: Support for external observability providers - Logging #1500

erikayao93 opened this issue Jun 13, 2023 · 12 comments · Fixed by #1511
Assignees
Labels
completed This item is complete and has been merged/shipped logger This item relates to the Logger Utility RFC Technical design documents related to a feature request
Milestone

Comments

@erikayao93
Copy link
Contributor

erikayao93 commented Jun 13, 2023

Is this related to an existing feature request or issue?

Powertools for AWS Lambda (Python) #2014, #1261, #646

Which Powertools for AWS Lambda (TypeScript) utility does this relate to?

Logger

Summary

Powertools for Python has implemented support for certain external observability providers out of the box for the Logger utility by extending the LogFormatter feature according to Powertools for AWS Lambda (Python) #2014. This allows customers using designated providers to use the Logger utility with either the default or a custom LogFormatter without having to design their own configurations to allow for integration with observability solutions other than CloudWatch.

In Powertools for TypeScript, the current LogFormatter lacks customizability on certain features, as described in issue #1261. By extending the capabilities of the LogFormatter to accommodate changes on all log features, we aim to better support third-party observability providers, starting with Datadog, and expanding to encompass the same list of providers as Powertools for Python.

Use case

The primary use case for this utility will be for customers who might want to extend the default LogFormatter to fully customize the logs emitted by Logger so that they can conform with their requirements or better work with supported AWS Lambda Ready Partners and AWS Partners observability providers.

Proposal

Terminology

  • Standard attributes — keys provided to structured logging by the default log formatter
  • Context-related attributes — added to standard attributes if developer uses injectLambdaContext() or calls the addContext() method for their logger
  • Base attributes — standard attributes and context related attributes that are currently handled by the formatter
  • Persistent attributes — defined by the developer and persists through all instances of logger output
  • Extra Input attributes — provided for logger output on a singular instance

Current LogFormatter Experience

Powertools for Typescript currently provides a base abstract class LogFormatter with the abstract method formatAttributes that must be implemented to define the formatting of base attributes:

public abstract formatAttributes(
    attributes: UnformattedAttributes
): LogAttributes;

Our default implementation for the Logger uses the PowertoolLogFormatter class that builds on top of the LogFormatter abstract class, maintaining its members and methods. The class provides the following implementation for the formatAttributes method:

public formatAttributes(attributes: UnformattedAttributes): PowertoolLog {
    return {
      cold_start: attributes.lambdaContext?.coldStart,
      function_arn: attributes.lambdaContext?.invokedFunctionArn,
      function_memory_size: attributes.lambdaContext?.memoryLimitInMB,
      function_name: attributes.lambdaContext?.functionName,
      function_request_id: attributes.lambdaContext?.awsRequestId,
      level: attributes.logLevel,
      message: attributes.message,
      sampling_rate: attributes.sampleRateValue,
      service: attributes.serviceName,
      timestamp: this.formatTimestamp(attributes.timestamp),
      xray_trace_id: attributes.xRayTraceId,
    };
}

This block of code provides default formatting for standard attributes and context-related ones if present. This formatted list of attributes is of type PowertoolLog, which is an alias for a specific list of LogAttributes.

If customers wish to customize the formatting to meet their own requirements or to work with third-party observability providers, they can define a similar class to alter the formatting on these standard and context-related attributes:

class MyCompanyLogFormatter extends LogFormatter {
  public formatAttributes(attributes: UnformattedAttributes): MyCompanyLog {
    return {
      message: attributes.message,
      service: attributes.serviceName,
      environment: attributes.environment,
      awsRegion: attributes.awsRegion,
      correlationIds: {
        awsRequestId: attributes.lambdaContext?.awsRequestId,
        xRayTraceId: attributes.xRayTraceId,
      },
      lambdaFunction: {
        name: attributes.lambdaContext?.functionName,
        arn: attributes.lambdaContext?.invokedFunctionArn,
        memoryLimitInMB: attributes.lambdaContext?.memoryLimitInMB,
        version: attributes.lambdaContext?.functionVersion,
        coldStart: attributes.lambdaContext?.coldStart,
      },
      logLevel: attributes.logLevel,
      timestamp: this.formatTimestamp(attributes.timestamp),
      logger: {
        sampleRateValue: attributes.sampleRateValue,
      },
    };
  }
}

export { MyCompanyLogFormatter };

Important to note here is the lack of functionality that is provided for altering persistent attributes or extra attributes. Instead, for persistent attributes added to the logger, it maintains the formatting that the user provided the attributes in. These are aggregated as another set of LogAttributes.

Both sets of LogAttributes (one for the standard attributes and context-related ones and one for the persistent attributes) are combined in the constructor for a LogItem, which has a private member attributes that is a list of LogAttributes.

public constructor(params: {
    baseAttributes: LogAttributes;
    persistentAttributes: LogAttributes;
  }) {
    this.addAttributes(params.baseAttributes);
    this.addAttributes(params.persistentAttributes);
}

The benefit of the LogItem is that it also possesses methods for adding, getting, deleting, and setting attributes. However, the addAttributes method simply merges a new set of logAttributes into attributes of logItem. This means that customers do not have the ability to reorganize their persistent attributes with the formatted list of base attributes, nor do they have the ability to reformat the persistent attributes, and the list is simply appended to the end of the attributes list.

For extra attributes, these attributes are converted into a set of logAttributes and added to the attributes list with the following code:

extraInput.forEach((item: Error | LogAttributes | string) => {
  const attributes: LogAttributes =
     item instanceof Error
       ? { error: item }
       : typeof item === 'string'
       ? { extra: item }
       : item;

  logItem.addAttributes(attributes);
});

Similar to the persistent attributes, the extraInput formatting here does not provide customers with the ability to reformat or reorganize the attributes with their existing list of attributes for the log. The new attributes are merely appended to the end of the attributes list, after the base attributes and persistent attributes.

Proposed PowertoolsLogFormatter Update

Firstly, to maintain naming consistency, PowertoolLogFormatter and PowertoolLog will be renamed to PowertoolsLogFormatter and PowertoolsLog. If this inconsistency is discovered in other parts of the Logger utility through the implementation of this proposal, the name will also be updated in those instances.

Instead of having persistent attributes and extra attributes merged into a LogItem after the base attributes have been formatted, we propose merging the persistent attributes and extra attributes first, into a set of logAttributes that can be passed into formatAttributes instead:

var additionalLogAttributes: LogAttributes = this.getPersistentLogAttributes();
if (typeof input !== 'string') {
    additionalLogAttributes = merge(additionalLogAttributes, input);
}
extraInput.forEach((item: Error | LogAttributes | string) => {
    const attributes: LogAttributes =
    item instanceof Error
        ? { error: item }
        : typeof item === 'string'
        ? { extra: item }
        : item;

    additionalLogAttributes = merge(additionalLogAttributes, attributes);
});

This functionality will be maintained in the Logger file, where the LogItem used to be created.

Then, to provide customers with access to formatting beyond the base attributes, we propose extending the definition of the base LogFormatter class and its formatAttributes method to access the original unformatted baseAttributes and the new additionalLogAttributes set:

public abstract formatAttributes(
    attributes: UnformattedAttributes,
    additionalLogAttributes: LogAttributes
  ): LogItem;

In order to maintain a mask on the merging functionality provided in the LogItem class, we will utilize addAttributes, as defined in LogItem, and formatAttributes will be redefined to generate a LogItem object rather than LogAttributes.

Within the PowertoolsLogFormatter, the formatAttributes function will be redefined in accordance to the LogFormatter update to take in all attributes, not just the base attributes:

public formatAttributes(attributes: UnformattedAttributes, 
                          additionalLogAttributes: LogAttributes): LogItem {

    const baseAttributes: PowertoolsLog = {
      cold_start: attributes.lambdaContext?.coldStart,
      function_arn: attributes.lambdaContext?.invokedFunctionArn,
      function_memory_size: attributes.lambdaContext?.memoryLimitInMB,
      function_name: attributes.lambdaContext?.functionName,
      function_request_id: attributes.lambdaContext?.awsRequestId,
      level: attributes.logLevel,
      message: attributes.message,
      sampling_rate: attributes.sampleRateValue,
      service: attributes.serviceName,
      timestamp: this.formatTimestamp(attributes.timestamp),
      xray_trace_id: attributes.xRayTraceId,
    };

    const powertoolsLogItem = new LogItem({attributes: baseAttributes});

    powertoolsLogItem.addAttributes(additionalLogAttributes);

    return powertoolsLogItem;
}

We maintain the attributes of PowertoolsLog but maintain the customizability of adding fields the same way current functionality does.

For a LogItem object to be created from this, we will have to redefine the constructor of LogItem to construct off a single attribute list. This will allow LogItem to be more flexible in terms of its initial definition, while maintaining that other attributes can always be added or deleted later using its member functions.

public constructor(params: {
    attributes: LogAttributes;
}) {
    this.addAttributes(params.attributes);
}

Custom LogFormatter Usage

If the customer would like to use another observability provider, or define their own logger functions, customers can define their own Formatter class that the customer can implement based on our existing framework and pass in to the Logger class. The class would implement the formatAttributes method similar to:

public formatAttributes(attributes: UnformattedAttributes, 
                          additionalLogAttributes: LogAttributes): LogItem {
    
    const attributeList: CustomLog = {
        cold_start: attributes.lambdaContext?.coldStart,
        function_arn: attributes.lambdaContext?.invokedFunctionArn,
        function_memory_size: attributes.lambdaContext?.memoryLimitInMB,
        function_name: attributes.lambdaContext?.functionName,
        function_request_id: attributes.lambdaContext?.awsRequestId,
        level: attributes.logLevel,
        message: attributes.message,
        sampling_rate: attributes.sampleRateValue,
        service: attributes.serviceName,
        timestamp: this.formatTimestamp(attributes.timestamp),
        xray_trace_id: attributes.xRayTraceId,
        custom_persistent_field = additionalLogAttributes.customPersistentField,
        custom_error_field = additionalLogAttributes.myError,
        custom_extra_field = additionalLogAttributes.customExtraField
    }
    
    const logItem = new LogItem({attributes: attributeList});

    return logItem;
}

If customers wish to, they can define their own CustomLog type to use as shown above that will pre-establish the fields of attributeList, similar to the PowertoolsLog type that the PowertoolsLogFormatter uses for the base attributes.

Out of scope

Sending logs from Powertools to the customer's desired observability platform will not be in the scope of this project. The implementation should only support modifying the output of the Logger so that the customer can push them to their platform of choice.

Potential challenges

These proposed changes will likely result in a breaking change, which will require a major release and a migration plan/guide.

By reorganizing the PowertoolsLogFormatter, some functionality that is currently contained in LogItem may be removed or migrated to the Formatters. We will have to identify other dependencies of the LogItem class to ensure that other functionality is not lost.

We need to determine which platforms we want to support out-of-the-box.

Dependencies and Integrations

We will have to integrate with (and thus, have a dependency on) the platforms we decide to support out-of-the-box. For now, we will be focusing on Datadog, but this should be expanded to encompass the same providers supported by Powertools for Python, listed here.

Alternative solutions

To optimize the merging of LogAttributes sets, we can consider migrating the removeEmptyKeys() method from LogItem to be used on small sets of attributes before they are merged together.

We have considered several different ways of handling passing the persistent attributes and extra attributes to the formatter:

  • Combine persistent attributes and extra attributes into two separate sets of LogAttributes to pass into formatAttributes
    • As discussed by @dreamorosi in this comment, both types of attributes are treated equally anyways, and there isn't much reason to treat them separately.
  • Pass persistent attributes and extra attributes in their raw form, without converting extra attributes into LogAttributes, so that customers can merge them into their log however they wish
    • The merging functionality ideally should remain masked from customers and remain in the Logger as much as possible, without encroaching on the more straightforward functionality for the formatter

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

@erikayao93 erikayao93 added triage This item has not been triaged by a maintainer, please wait RFC Technical design documents related to a feature request labels Jun 13, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 13, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

@erikayao93
Copy link
Contributor Author

In response to @dreamorosi comment internally:

I think it'd benefit us to agree on a shared definition of what formatting mean.

My understanding would go towards changing the arrangement / shape of the fields and sorting order of them.

With this in mind, I'm not sure we should leak the internal merging of attributes (i.e. extraInput) below but keep it in the Logger.

Basically I think this method should receive a set of objects already merged and return an object with the desired shape that arranges these key/vals.

So the main issue is that formatAttributes right now takes a type UnformattedAttributes and turns it into LogAttributes. LogAttributes then becomes a LogItem and extra attributes are only merged into LogItem and this happens one by one at the moment. So the base attributes and extra attributes aren't really compatible until they all go into LogItem and are merged there.

Maybe can consider merging the persistent attributes and extra attributes in the Logger, and then passing UnformattedAttributes and one single LogAttributes set to formatAttributes?

@erikayao93
Copy link
Contributor Author

erikayao93 commented Jun 15, 2023

So I think it can be beneficial to do a combination of the proposed solution I've suggested and the alternative solution provided, along with an additional change.

To keep the processing of the extraInput hidden behind the scenes, we keep that functionality in Logger, but instead of merging into a logItem we keep them in a logAttribute set.

var extraAttributes: LogAttributes = {};
if (typeof input !== 'string') {
  extraAttributes = merge(extraAttributes, input);
}
extraInput.forEach((item: Error | LogAttributes | string) => {
  const attributes: LogAttributes =
    item instanceof Error
      ? { error: item }
      : typeof item === 'string'
      ? { extra: item }
      : item;

      extraAttributes = merge(extraAttributes, attributes);
});

Then, we'll have formatAttributes take in three parameters, as originally proposed in issue #1261, but instead of the extraInput being in the form of LogItemExtraInput, it has already been turned into another set of logAttributes:

public formatAttributes(attributes: UnformattedAttributes, 
                  persistentLogAttributes?: LogAttributes, 
                  extraAttributes?: LogAttributes): PowertoolLogItem;

If we want to continue keeping the merge functionality masked from the developer perspective, I think having a child class of PowertoolLogItem as I suggested in the alternative solution can be useful, since it already provides the function addAttribute which masks the merge.

Regardless, I think rewriting the constructor for a LogItem will be beneficial, regardless of which formatting solution we choose to move forward with, which is why it's currently described in the both the proposed solution and the alternative solution. The current constructor feels very rigid with how we create our logAttribute set, especially with the required parameters, and I don't think it will change customer experience very much to change the constructor since it's only really used on the back-end.

One question I do have is how flexible the PowertoolLog type is? Does it allow for additional attributes that aren't explicitly defined in the type? This might impact the way PowertoolLogItem happens, since if the PowertoolLog doesn't accept additional attributes, we probably should just use a regular LogAttribute set, in which case we could just use the LogItem class and the child class isn't needed at all, and we could end up with something like this for formatAttributes:

public formatAttributes(attributes: UnformattedAttributes, 
                        persistentLogAttributes: LogAttributes, 
                        additionalLogAttributes: LogAttributes): LogItem {

  const baseAttributes: PowertoolLog = {
    // standard attributes from UnformattedAttributes
    cold_start: attributes.lambdaContext?.coldStart,
    function_arn: attributes.lambdaContext?.invokedFunctionArn,
    function_memory_size: attributes.lambdaContext?.memoryLimitInMB,
    function_name: attributes.lambdaContext?.functionName,
    function_request_id: attributes.lambdaContext?.awsRequestId,
    level: attributes.logLevel,
    message: attributes.message,
    sampling_rate: attributes.sampleRateValue,
    service: attributes.serviceName,
    timestamp: this.formatTimestamp(attributes.timestamp),
    xray_trace_id: attributes.xRayTraceId,
  };

  const powertoolLogItem = new LogItem({attributes: baseAttributes});

  powertoolLogItem.addAttributes(persistentLogAttributes);
  powertoolLogItem.addAttributes(additionalLogAttributes);

  return powertoolLogItem;
}

Let me know what you think of this solution!

@dreamorosi dreamorosi added logger This item relates to the Logger Utility discussing The issue needs to be discussed, elaborated, or refined and removed triage This item has not been triaged by a maintainer, please wait labels Jun 16, 2023
@dreamorosi
Copy link
Contributor

Hi Erika, thank you for the thoughtful discussion and also for taking in account my first round of internal comments.

I like the direction in which the proposal is going and I think that the mixed proposal in your last comment is good.

I also like both the ideas of expanding the LogItem constructor as well as defining a PowertoolLog type; and to answer your question about it:

One question I do have is how flexible the PowertoolLog type is? Does it allow for additional attributes that aren't explicitly defined in the type?

In theory, unless we decide to add new standard attributes, all the PowertoolLog are known in advanced and are the ones you have already included above. Any additional attribute (user provided) will either go into persistentLogAttributes or additionalLogAttributes. With this in mind I don't see a problem in defining these attributes in their own explicit type; if we ever decide to add/remove/modify any of the attributes or their types we'll make sure to also update the type.


A of notes/question about the hybrid proposal, specifically around the last few lines:

const powertoolLogItem = new LogItem({attributes: baseAttributes});

powertoolLogItem.addAttributes(persistentLogAttributes);
powertoolLogItem.addAttributes(additionalLogAttributes);

I know that the proposal comes from the current implementation as well as the previous discussions, however now I'm starting to question what's the added value here in having the persistent & additional log attributes separate in the first place.

Both types of attributes are user-provided attributes and the main/only differences about them are how long they are kept around and how they were added, which are concerns that involve the Logger class/utility but not necessarily the specific log item.

Basically what I'm getting at is that at the time of formatting, which is right before emitting the log entry, these are just attributes and the log item doesn't care/have to know whether some of them will show up only in this entry / function invocation or in all entries.

For the PowertoolLogFormatter, both types of attributes are treated equally and lumped at the top-level of the log JSON object. For customer-defined log formats, based on my assumption & what I've seen in other discussions with customers, I think customers will want to organize/format the log object based on the key name rather than the type.

Relying on whether an attribute is "persistent" or "additional" to define the structure would create a strong coupling between how the logs are generated in the code & how they are formatted, and any inconsistency would result in two different logs.

For instance, let's take these two cases:

In the first case I define accountId as persistent attribute, which in the current definition would make it end up in persistentLogAttributes:

const logger = new Logger({
  persistentLogAttributes: {
    accountId: '1234'
  },
});
logger.info('hi');

and then this second case, in which I pass accountId as extra attribute, which would make it appear in additionalLogAttributes:

const logger = new Logger();
logger.info('hi', { accountId: '1234' });

If I was defining a log formatter based on the types (i.e. I want all the persistent attributes in a certain place & the additional attributes in another), these two ways of instantiating the logger & emitting a log would result in two different JSON logs.

In reality, in most cases you would want to define a structure based on the key & values, for instance: "as a customer I want the account id and function ARN, memory, etc. inside a key called functionInfo", aka:

  const baseAttributes: PowertoolLog = {
    // standard attributes from UnformattedAttributes
    cold_start: attributes.lambdaContext?.coldStart,
    functionInfo: {
      function_arn: attributes.lambdaContext?.invokedFunctionArn,
      function_memory_size: attributes.lambdaContext?.memoryLimitInMB,
      function_name: attributes.lambdaContext?.functionName,
      // NOTE HERE
      accountId: extraAttributes?.accountId,
    },
    function_request_id: attributes.lambdaContext?.awsRequestId,
    level: attributes.logLevel,
    message: attributes.message,
    sampling_rate: attributes.sampleRateValue,
    service: attributes.serviceName,
    timestamp: this.formatTimestamp(attributes.timestamp),
    xray_trace_id: attributes.xRayTraceId,
  };

If the two types of attributes are traded the same then I can do this easily. If instead we keep them separately in order to achieve something as trivial as the above would require me to handle conditionals (accountId: persistentLogAttributes?.accountId || additionalLogAttributes?.accountId,).

To sum up: I think the Logger could continue to treat the two types as separate because they have different shelf-life, however once we get to the formatter they could potentially be joined together.

What do you think? Please let me know if I misunderstood your proposal and/or I'm going off base.

Also looping @am29d so he can start getting up to speed with the RFC & comment.


Also, and finally, minor: since we are reworking this area, can we rename the PowertoolLogFormatter to PowertoolsLogFormatter (note extra s) so that it's consistent with the brand?

@erikayao93
Copy link
Contributor Author

The PowertoolLog actually exists already -- it is what the UnformattedAttributes that were originally passed into formatAttributes get converted to, so I'll just leave that as is.

I'm on board with letting the formatter handle persistent and additional attributes together -- they'd be merged behind the scenes in Logger before getting passed into the formatter, so we'd still be able to keep that behavior masked. I can go ahead and make some updates to the original proposal about the parameters -- I'll add the old ideas to the alternative solutions (and clean up that section), just so we can keep track of the approaches we've considered.


Totally on board with renaming the PowertoolLogFormatter -- this was actually something I was thinking of asking about! The PowertoolLog type is also named inconsistently right now, I'll go ahead and change that with the formatter renaming as well.


Let me know if there's any more questions about the design, and @am29d, I look forward to working with you as well!

@dreamorosi
Copy link
Contributor

Great to hear @erikayao93!

Another thing I just realized is that currently when instantiating LogItem or adding attributes we save them in the class as they are added.

Then, every time we format the log item (aka emit a log) we remove the values that evaluate to null or undefined.

Do you think this is something that we could move in the constructor, or even already in the Logger? I think doing this might speed up the attribute merging since there'd be potentially less keys to handle. And even if the merging operation stays similar in terms of perf, we there's no much value in keeping these empty keys around. What do you think?

@erikayao93
Copy link
Contributor Author

From what I can see about the behavior you just described, it's done through a member method in LogItem that scans the attribute list after all the attributes have been added.

I could add the same method to the Logger class? Then I can scan the persistent attributes and additional attributes individually before merging them, and this could speed up the attribute merging as you suggested?

It's a little more difficult for cleaning up the base attributes, since they aren't a set of LogAttributes until we get in the formatter, but I think what I could do is incorporate that method into the LogItem constructor, as you also suggested.

If we go forward with this design, do we still want to have that final check before we emit a log?

@dreamorosi
Copy link
Contributor

Not sure, at least for the Powertools stack it'd be superfluous since we control everything.

For custom formatters there's still a chance customers might format in a way that would result in attributes being undefined.

So maybe I'm over complicating things and the initial design is fine. I propose that we keep this on the radar but make a decision once we start the actual implementation, I don't think it's a must and don't want to derail the conversation further (thanks for entertaining the thought tho!)

@erikayao93
Copy link
Contributor Author

Sounds good, I've updated the RFC to better match the design decisions we've talked about, let me know if I missed anything!

I also cleaned up the alternative solutions and left a note about handling the empty keys there, just so we have a bit of documentation on that.

@erikayao93
Copy link
Contributor Author

Just realized I forgot to address the naming inconsistencies you brought up -- added a little section in the proposal for that now and updated the name everywhere I could find it!

@heitorlessa
Copy link
Contributor

made tiny edits to enable syntax highlighting to ease reading

```typescript
<code>

@dreamorosi dreamorosi added this to the Version 2.0 milestone Jun 22, 2023
@dreamorosi dreamorosi added confirmed The scope is clear, ready for implementation completed This item is complete and has been merged/shipped and removed discussing The issue needs to be discussed, elaborated, or refined confirmed The scope is clear, ready for implementation labels Aug 2, 2023
@github-actions
Copy link
Contributor

⚠️ COMMENT VISIBILITY WARNING ⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completed This item is complete and has been merged/shipped logger This item relates to the Logger Utility RFC Technical design documents related to a feature request
Projects
3 participants