Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic conventions for instrumenting AWS Lambda. #1442

Merged
merged 25 commits into from
Mar 24, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions semantic_conventions/trace/instrumentation/aws-lambda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
groups:
- id: aws.lambda
brief: "Lambda function invocation spans."
attributes:
- ref: faas.execution
brief: "The value of the AWS Request ID from the Lambda `Context`."
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved
examples:
- 943ad105-7543-11e6-a9ac-65e093327849
- ref: faas.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the attributes here are not Lambda specific, can we extend the https://github.com/open-telemetry/opentelemetry-specification/blob/main/semantic_conventions/trace/faas.yaml instead? If not, maybe we should namespace them differently. For example, "aws.lambda.id", "aws.lambda.execution", etc.

Copy link
Member

@Oberon00 Oberon00 Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was the intention of @anuraaga. FaaS conventions IMHO sufficiently describe the common lambda attributes (e.g. it tells you that the ARN is the faas.id in the faas' note already). But this document describes in more detail how to apply existing semantic conventions to Lambda. So it is IMHO right to refer to (note that ref is used here instead of introducing a new attribute with id) the existing, fitting attributes.

brief: "The value of the invocation arn for the function from the Lambda `Context`."
examples:
- arn:aws:lambda:us-east-2:123456789012:function:my-function:1
- ref: cloud.account.id
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
brief: "The account ID for the function. If not provided on Lambda `Context`, it SHOULD be parsed from the value of `faas.id` as the fifth item when splitting on `:`."
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
- id: aws.lambda.api-gateway-proxy
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
brief: >
These conventions apply to AWS Lambda requests that are served by the API Gateway in proxy mode. All the
information about the underlying HTTP request is available.
attributes:
- ref: faas.trigger
brief: "MUST be `http`."
examples:
- http
- ref: http.method
- ref: http.status_code
- ref: http.url
- ref: http.user_agent
- id: aws.lambda.sqs-event
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
brief: "These conventions apply to an SQS event."
attributes:
- ref: faas.trigger
brief: "MUST be `pubsub`."
examples:
- pubsub
- ref: messaging.system
brief: "MUST be `AmazonSQS`."
examples:
- AmazonSQS
- ref: messaging.operation
brief: "MUST be `process`."
examples:
- process
- id: aws.lambda.sqs-message
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
brief: "These conventions apply to an SQS message."
attributes:
- ref: faas.trigger
brief: "MUST be `pubsub`."
examples:
- pubsub
- ref: messaging.system
brief: "MUST be `AmazonSQS`."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern seems commonly useful enough that special support in the semantic convention generator would be nice as in fixed_value: "AmazonSQS".

examples:
- AmazonSQS
- ref: messaging.operation
brief: "MUST be `process`."
examples:
- process
- ref: messaging.message_id
brief: "The value of the message ID for the message."
- ref: messaging.destination
brief: "The value of the event source for the message."
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Instrumenting AWS Lambda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document doesn't mention its relationship to the FaaS spec, https://github.com/open-telemetry/opentelemetry-specification/blob/main/semantic_conventions/trace/faas.yaml. Is it extending it or overriding it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the first paragraph clarify this, including an actual link to faas? Let me know if something's unclear.

anuraaga marked this conversation as resolved.
Show resolved Hide resolved

**Status**: [Experimental](../../../document-status.md)

This document defines how to fill semantic conventions when instrumenting an AWS Lambda request handler. AWS
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
Lambda largely follows the conventions for [FaaS](../faas.md) while [HTTP](../http.md) conventions are also
applicable when handlers are for HTTP requests.

There are a variety of triggers for Lambda functions, and this document will grow over time to cover all the
use cases.

anuraaga marked this conversation as resolved.
Show resolved Hide resolved
## All triggers

For all events, a span with kind `SERVER` MUST be created corresponding to the function invocation. Unless
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
stated otherwise below, the name of the span MUST be set to the function name from the Lambda `Context`.

<!-- semconv aws.lambda -->
| Attribute | Type | Description | Examples | Required |
|---|---|---|---|---|
| [`cloud.account.id`](../../../resource/semantic_conventions/cloud.md) | string | The account ID for the function. If not provided on Lambda `Context`, it SHOULD be parsed from the value of `faas.id` as the fifth item when splitting on `:`. | `opentelemetry` | No |
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
| [`faas.execution`](../faas.md) | string | The value of the AWS Request ID from the Lambda `Context`. | `943ad105-7543-11e6-a9ac-65e093327849` | No |
| [`faas.id`](../../../resource/semantic_conventions/faas.md) | string | The value of the invocation arn for the function from the Lambda `Context`. [1] | `arn:aws:lambda:us-east-2:123456789012:function:my-function:1` | No |

**[1]:** For example, in AWS Lambda this field corresponds to the [ARN](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) value, in GCP to the URI of the resource, and in Azure to the [FunctionDirectory](https://github.com/Azure/azure-functions-host/wiki/Retrieving-information-about-the-currently-running-function) field.
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
<!-- endsemconv -->

The parent of the span MUST be determined by considering both the environment and any headers or attributes
available from the event.
anuraaga marked this conversation as resolved.
Show resolved Hide resolved

If the `_X_AMZN_TRACE_ID` environment variable is set, it SHOULD be parsed into an OpenTelemetry `Context` using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the application owner should be able to decide this. Ideally, the X-Ray propagator (or a specialized LambdaXrayPropagator) would be written in such a way that it checks the environment variable itself. The instrumentation should not decide this.

Copy link
Contributor Author

@anuraaga anuraaga Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can currently decide by enabling or disabling XRay so I think this reflects that choice well. If we had another option in the instrumentation it's two settings related to XRay which I think is just an extra setting.

It reminds me that in a follow-up to the SDK PR I need to describe using the AWS propagator directly on the HTTP calls as we do in Java. It's the only recognized format for the next years so there is no point for an option at least until that changes it's just an extra option. I think it's similar for Lambda.

the [AWS X-Ray Propagator](../../../context/api-propagators.md). If the resulting `Context` is sampled, then this
`Context` is the parent of the function span. The environment variable will be set and the `Context` will be
sampled only if AWS X-Ray has been enabled for the Lambda function. A user can disable AWS X-Ray for the function
if this propagation is not desired.

Otherwise, for an API Gateway Proxy Request, the user's configured propagators should be applied to the HTTP
headers of the request to extract a `Context`.

## API Gateway

API Gateway allows a user to trigger a Lambda function in response to HTTP requests. It can be configured to be
a pure proxy, where the information about the original HTTP request is passed to the Lambda function, or as a
configuration for a REST API, in which case only a deserialized body payload is available. In the case the API
gateway is configured to proxy to the Lambda function, the instrumented request handler will have access to all
the information about the HTTP request in the form of an API Gateway Proxy Request Event.

The Lambda span name SHOULD be set to the `Resource` from the proxy request event, which corresponds to the user
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
configured HTTP route instead of the function name.
anuraaga marked this conversation as resolved.
Show resolved Hide resolved

<!-- semconv aws.lambda.api-gateway-proxy -->
| Attribute | Type | Description | Examples | Required |
|---|---|---|---|---|
| [`faas.trigger`](../faas.md) | string | MUST be `http`. | `http` | No |
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
| [`http.method`](../http.md) | string | HTTP request method. | `GET`; `POST`; `HEAD` | No |
| [`http.status_code`](../http.md) | number | [HTTP response status code](https://tools.ietf.org/html/rfc7231#section-6). | `200` | No |
| [`http.url`](../http.md) | string | Full HTTP request URL in the form `scheme://host[:port]/path?query[#fragment]`. Usually the fragment is not transmitted over HTTP, but if it is known, it should be included nevertheless. | `https://www.foo.bar/search?q=OpenTelemetry#SemConv` | No |
| [`http.user_agent`](../http.md) | string | Value of the [HTTP User-Agent](https://tools.ietf.org/html/rfc7231#section-5.5.3) header sent by the client. | `CERN-LineMode/2.15 libwww/2.17b3` | No |
<!-- endsemconv -->

## SQS

SQS is a message queue that triggers a Lambda function with a batch of messages. In addition to the span for the
function invocation, two spans SHOULD be generated, one for the batch of messages, called an SQS event, and one
for each individual message, called an SQS message.

The span kind for both spans MUST be `CONSUMER`.
anuraaga marked this conversation as resolved.
Show resolved Hide resolved

For the SQS event span, if all the messages in the event have the same event source, the name of the span MUST
be `<event source> process`. If there are multiple sources in the batch, the name MUST be
`multiple_sources <process>`. The parent MUST be the `SERVER` span corresponding to the function invocation.
anuraaga marked this conversation as resolved.
Show resolved Hide resolved

For every message in the event, the message's system attributes (not message attributes, which are provided by
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
the user) SHOULD be checked for the key `AWSTraceHeader`. If it is present, an OpenTelemetry `Context` SHOULD be
parsed from the value of the attribute using the [AWS X-Ray Propagator](../../../context/api-propagators.md) and
added as a link to the span. This means the span may have as many links as messages in the batch.

<!-- semconv aws.lambda.sqs-event -->
| Attribute | Type | Description | Examples | Required |
|---|---|---|---|---|
| [`faas.trigger`](../faas.md) | string | MUST be `pubsub`. | `pubsub` | No |
| [`messaging.operation`](../messaging.md) | string | MUST be `process`. | `process` | No |
| [`messaging.system`](../messaging.md) | string | MUST be `AmazonSQS`. | `AmazonSQS` | No |
<!-- endsemconv -->

For the SQS message span, the name MUST be `<event source> process`. The parent MUST be the `CONSUMER` span
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
corresponding to the SQS event. The message's system attributes (not message attributes, which are provided by
anuraaga marked this conversation as resolved.
Show resolved Hide resolved
the user) SHOULD be checked for the key `AWSTraceHeader`. If it is present, an OpenTelemetry `Context` SHOULD be
parsed from the value of the attribute using the [AWS X-Ray Propagator](../../../context/api-propagators.md) and
added as a link to the span.

<!-- semconv aws.lambda.sqs-message -->
| Attribute | Type | Description | Examples | Required |
|---|---|---|---|---|
| [`faas.trigger`](../faas.md) | string | MUST be `pubsub`. | `pubsub` | No |
| [`messaging.destination`](../messaging.md) | string | The value of the event source for the message. | `MyQueue`; `MyTopic` | No |
| [`messaging.message_id`](../messaging.md) | string | The value of the message ID for the message. | `452a7c7c7c7048c2f887f61572b18fc2` | No |
| [`messaging.operation`](../messaging.md) | string | MUST be `process`. | `process` | No |
| [`messaging.system`](../messaging.md) | string | MUST be `AmazonSQS`. | `AmazonSQS` | No |
<!-- endsemconv -->

Note that `AWSTraceHeader` is the only supported mechanism for propagating `Context` for SQS to prevent conflicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported by whom?

Copy link
Member

@Oberon00 Oberon00 Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to use any OpenTelemetry-compatible vendor with SQS without paying for X-Ray. I'd like it more if we say something to the effect that "Instrumentations SHOULD default to using AWSTraceHeader with AWS X-Ray Propagator, but SHOULD be configurable to use any other header and propagator". If that makes sense at all. I.e., I don't think we should indirectly "force" users to pay for X-Ray in our semantic conventions, even if AWS has implemented special features to make it nicer to use than other solutions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also related: #1442 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant supported by our instrumentation. AWSTraceHeader doesn't mention the word X-Ray :) It's how to propagate context within AWS services and by having AWS SDK instrumentation and Lambda instrumentation explicitly use the X-Amzn-Trace-Id format (which we generally incorrectly call x-ray, it's actually not really tied to x-ray), it precisely allows any vendor to use AWS asynchronous services with proper propagation. No payment for X-Ray or anything involved. While it might be possible to implement our own propagation through message attributes, I'm not confident it would be propagated between services (e.g., S3 -> SNS -> SQS is a possible flow) while I am confident that AWSTraceHeader will be, regardless of tracing vendor. @kubawach has done incredible testing on these propagation cases, I'm guessing for use with Splunk, not X-Ray :P

I am going to try adding some text with some of this information to allay the concerns, which are fair but I wouldn't be worried :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support @anuraaga here.

With the first approach to AWS propagation (for SQS producer - consumer) I used "standard" HTTP-headers mechanism, creating a SQS message attribute (upon produce) and extracting parent from it (upon consume). While this approach worked in SQS - SQS scenario, it would fail with more complex (but generally used around the world) scenarios such as S3 - SQS or S3 - SNS - SQS. Therefore I had to switch back to using AWS feature, guaranteeing that if AWS Trace header is set during AWS SDK request, the value will be maintained and returned at the end of the chain (consume - as SQS system attribute). There was simply no other way to do it (ie without relying on AWS features).

To sum up, approach that came out of discussions and code reviews was:

  • when interacting with AWS (using AWS SDK) we will enforce X-Ray propagator (which basically sets AWS trace header in an appropriate format) in order to ensure that context propagation will be maintained thru the infrastructure (which is beyond our control)
  • if there is a system that conforms to AWS interface and supports AWS SDK, in order to maintain the propagation it will also need to support AWS tracing header / format

Frankly at the beginning it didn't feel right to implement propagation enforcing AWS trace format but I realised that it's just relying on how a closed system work, just as we do with instrumentation libraries, in order to support as many use cases as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment addressed @arminru ?

with other sources. Notably, message attributes (user-provided, not system) are not supported - the linked contexts
are always expected to have been sent as HTTP headers of the `SQS.SendMessage` request that the message originated
from. This is a function of AWS SDK instrumentation, not Lambda instrumentation.