Status | |
---|---|
Stability | development: metrics |
beta: traces, logs | |
Distributions | contrib, k8s |
Issues | |
Code Owners | @jpkrohling |
This is an exporter that will consistently export spans, metrics and logs depending on the routing_key
configured.
The options for routing_key
are: service
, traceID
, metric
(metric name), resource
, streamID
.
routing_key | can be used for |
---|---|
service | logs, spans, metrics |
traceID | logs, spans |
resource | metrics |
metric | metrics |
streamID | metrics |
If no routing_key
is configured, the default routing mechanism is traceID
for traces, while service
is the default for metrics. This means that spans belonging to the same traceID
(or service.name
, when service
is used as the routing_key
) will be sent to the same backend.
It requires a source of backend information to be provided: static, with a fixed list of backends, or DNS, with a hostname that will resolve to all IP addresses to use (such as a Kubernetes headless service). The DNS resolver will periodically check for updates.
Note that either the Trace ID or Service name is used for the decision on which backend to use: the actual backend load isn't taken into consideration. Even though this load-balancer won't do round-robin balancing of the batches, the load distribution should be very similar among backends with a standard deviation under 5% at the current configuration.
This load balancer is especially useful for backends configured with tail-based samplers or red-metrics-collectors, which make a decision based on the view of the full trace.
When a list of backends is updated, some of the signals will be rerouted to different backends. Around R/N of the "routes" will be rerouted differently, where:
- A "route" is either a trace ID or a service name mapped to a certain backend.
- "R" is the total number of routes.
- "N" is the total number of backends.
This should be stable enough for most cases, and the larger the number of backends, the less disruption it should cause. Still, if routing stability is important for your use case and your list of backends are constantly changing, consider using the groupbytrace
processor. This way, traces are dispatched atomically to this exporter, and the same decision about the backend is made for the trace as a whole.
This also supports service name based exporting for traces. If you have two or more collectors that collect traces and then use spanmetrics connector to generate metrics and push to prometheus, there is a high chance of facing label collisions on prometheus if the routing is based on traceID
because every collector sees the service+operation
label. With service name based routing, each collector can only see one service name and can push metrics without any label collisions.
The loadbalancingexporter
will, irrespective of the chosen resolver (static
, dns
, k8s
), create one exporter per endpoint. The exporter conforms to its published configuration regarding sending queue and retry mechanisms. Importantly, the loadbalancingexporter
will not attempt to re-route data to a healthy endpoint on delivery failure, and data loss is therefore possible if the exporter's target remains unavailable once redelivery is exhausted. Due consideration needs to be given to the exporter queue and retry configuration when running in a highly elastic environment.
- When using the
static
resolver and a target is unavailable, all the target's load-balanced telemetry will fail to be delivered until either the target is restored or removed from the static list. The same principle applies to thedns
resolver. - When using
k8s
,dns
, and likely future resolvers, topology changes are eventually reflected in theloadbalancingexporter
. Thek8s
resolver will update more quickly thandns
, but a window of time in which the true topology doesn't match the view of theloadbalancingexporter
remains.
Refer to config.yaml for detailed examples on using the processor.
- The
otlp
property configures the template used for building the OTLP exporter. Refer to the OTLP Exporter documentation for information on which options are available. Note that theendpoint
property should not be set and will be overridden by this exporter with the backend endpoint. - The
resolver
accepts astatic
node, adns
, ak8s
service oraws_cloud_map
. If all four are specified, anerrMultipleResolversProvided
error will be thrown. - The
hostname
property inside adns
node specifies the hostname to query in order to obtain the list of IP addresses. - The
dns
node also accepts the following optional properties:hostname
DNS hostname to resolve.port
port to be used for exporting the traces to the IP addresses resolved fromhostname
. Ifport
is not specified, the default port 4317 is used.interval
resolver interval in go-Duration format, e.g.5s
,1d
,30m
. If not specified,5s
will be used.timeout
resolver timeout in go-Duration format, e.g.5s
,1d
,30m
. If not specified,1s
will be used.
- The
k8s
node accepts the following optional properties:service
Kubernetes service to resolve, e.g.lb-svc.lb-ns
. If no namespace is specified, an attempt will be made to infer the namespace for this collector, and if this fails it will fall back to thedefault
namespace.ports
port to be used for exporting the traces to the addresses resolved fromservice
. Ifports
is not specified, the default port 4317 is used. When multiple ports are specified, two backends are added to the load balancer as if they were at different pods.timeout
resolver timeout in go-Duration format, e.g.5s
,1d
,30m
. If not specified,1s
will be used.
- The
aws_cloud_map
node accepts the following properties:namespace
The CloudMap namespace where the service is register, e.g.cloudmap
. If nonamespace
is specified, this will fail to start the Load Balancer exporter.service_name
The name of the service that you specified when you registered the instance, e.g.otelcollectors
. If noservice_name
is specified, this will fail to start the Load Balancer exporter.interval
resolver interval in go-Duration format, e.g.5s
,1d
,30m
. If not specified,30s
will be used.timeout
resolver timeout in go-Duration format, e.g.5s
,1d
,30m
. If not specified,5s
will be used.port
port to be used for exporting the traces to the addresses resolved fromservice
. By default, the port is set in Cloud Map, but can be be overridden with a static value in this confighealth_status
filter in AWS Cloud Map, you can specify the health status of the instances that you want to discover. The health_status filter is optional and allows you to query based on the health status of the instances.- Available values are
HEALTHY
: Only return instances that are healthy.UNHEALTHY
: Only return instances that are unhealthy.ALL
: Return all instances, regardless of their health status.HEALTHY_OR_ELSE_ALL
: Returns healthy instances, unless none are reporting a healthy state. In that case, return all instances. This is also called failing open.
- Resolver's default filter is set to
HEALTHY
when none is explicitly defined
- Available values are
- Notes:
- This resolver currently returns a maximum of 100 hosts.
TODO
: Feature request 29771 aims to cover the pagination for this scenario
- The
routing_key
property is used to specify how to route values (spans or metrics) to exporters based on different parameters. This functionality is currently enabled only fortrace
andmetric
pipeline types. It supports one of the following values:service
: Routes values based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate.traceID
: Routes spans based on theirtraceID
. Invalid for metrics.metric
: Routes metrics based on their metric name. Invalid for spans.streamID
: Routes metrics based on their datapoint streamID. That's the unique hash of all it's attributes, plus the attributes and identifying information of its resource, scope, and metric data
Simple example
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
processors:
exporters:
loadbalancing:
routing_key: "service"
protocol:
otlp:
# all options from the OTLP exporter are supported
# except the endpoint
timeout: 1s
resolver:
static:
hostnames:
- backend-1:4317
- backend-2:4317
- backend-3:4317
- backend-4:4317
# Notice to config a headless service DNS in Kubernetes
# dns:
# hostname: otelcol-headless.observability.svc.cluster.local
service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
logs:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
Kubernetes resolver example (For a more specific example: example/k8s-resolver)
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
processors:
exporters:
loadbalancing:
routing_key: "service"
protocol:
otlp:
# all options from the OTLP exporter are supported
# except the endpoint
timeout: 1s
resolver:
# use k8s service resolver, if collector runs in kubernetes environment
k8s:
service: lb-svc.kube-public
ports:
- 15317
- 16317
service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
logs:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
AWS CloudMap resolver example
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
processors:
exporters:
loadbalancing:
protocol:
otlp:
# all options from the OTLP exporter are supported
# except the endpoint
timeout: 3s
resolver:
aws_cloud_map:
namespace: aws-namespace
service_name: aws-otel-col-service-name
interval: 30s
service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
logs:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
For testing purposes, the following configuration can be used, where both the load balancer and all backends are running locally:
receivers:
otlp/loadbalancer:
protocols:
grpc:
endpoint: localhost:4317
otlp/backend-1:
protocols:
grpc:
endpoint: localhost:55690
otlp/backend-2:
protocols:
grpc:
endpoint: localhost:55700
otlp/backend-3:
protocols:
grpc:
endpoint: localhost:55710
otlp/backend-4:
protocols:
grpc:
endpoint: localhost:55720
processors:
exporters:
debug:
loadbalancing:
protocol:
otlp:
timeout: 1s
tls:
insecure: true
resolver:
static:
hostnames:
- localhost:55690
- localhost:55700
- localhost:55710
- localhost:55720
service:
pipelines:
traces/loadbalancer:
receivers:
- otlp/loadbalancer
processors: []
exporters:
- loadbalancing
traces/backend-1:
receivers:
- otlp/backend-1
processors: []
exporters:
- debug
traces/backend-2:
receivers:
- otlp/backend-2
processors: []
exporters:
- debug
traces/backend-3:
receivers:
- otlp/backend-3
processors: []
exporters:
- debug
traces/backend-4:
receivers:
- otlp/backend-4
processors: []
exporters:
- debug
logs/loadbalancer:
receivers:
- otlp/loadbalancer
processors: []
exporters:
- loadbalancing
logs/backend-1:
receivers:
- otlp/backend-1
processors: []
exporters:
- debug
logs/backend-2:
receivers:
- otlp/backend-2
processors: []
exporters:
- debug
logs/backend-3:
receivers:
- otlp/backend-3
processors: []
exporters:
- debug
logs/backend-4:
receivers:
- otlp/backend-4
processors: []
exporters:
- debug
The following metrics are recorded by this processor:
otelcol_loadbalancer_num_resolutions
represents the total number of resolutions performed by the resolver specified in the tagresolver
, split by their outcome (success=true|false
). For the static resolver, this should always be1
with the tagsuccess=true
.otelcol_loadbalancer_num_backends
informs how many backends are currently in use. It should always match the number of items specified in the configuration file in case thestatic
resolver is used, and should eventually (seconds) catch up with the DNS changes. Note that DNS caches that might exist between the load balancer and the record authority will influence how long it takes for the load balancer to see the change.otelcol_loadbalancer_num_backend_updates
records how many of the resolutions resulted in a new list of backends. Use this information to understand how frequent your backend updates are and how often the ring is rebalanced. If the DNS hostname is always returning the same list of IP addresses but this metric keeps increasing, it might indicate a bug in the load balancer.otelcol_loadbalancer_backend_latency
measures the latency for each backend.otelcol_loadbalancer_backend_outcome
counts what the outcomes were for each endpoint,success=true|false
.