Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

Merged
merged 129 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from 128 commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
e822a9b
Add t-value sampler draft
jmacd May 12, 2023
1bc6017
copy/import tracestate parser package
jmacd May 15, 2023
d1fd891
test ot tracestate
jmacd May 16, 2023
85e4472
tidy
jmacd May 16, 2023
bb75f8a
renames
jmacd May 16, 2023
6a57b77
testing two parsers w/ generic code
jmacd May 17, 2023
7fa8130
integrated
jmacd May 17, 2023
36230e7
Comments
jmacd May 17, 2023
7bae35c
revert two files
jmacd May 17, 2023
9010a67
Update with r, s, and t-value. Now using regexps and strings.IndexBy…
jmacd Jun 1, 2023
0e27e40
fix sampler build
jmacd Jun 1, 2023
efcdc3d
add support for s-value for non-consistent mode
jmacd Jun 1, 2023
939c758
WIP
jmacd Jul 10, 2023
b9a1e56
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Aug 2, 2023
a31266c
use new proposed syntax see https://github.com/open-telemetry/opentel…
jmacd Aug 2, 2023
690cd64
update tracestate libs for new encoding
jmacd Aug 2, 2023
c8baf29
wip working on probabilistic sampler with two new modes: downsampler …
jmacd Aug 2, 2023
7f47e4a
unsigned implement split
jmacd Aug 3, 2023
422e0b2
two implementations
jmacd Aug 3, 2023
787b9fd
wip
jmacd Sep 5, 2023
ed36f03
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Sep 6, 2023
d795210
Updates for OTEP 235
jmacd Sep 6, 2023
09000f7
wip TODO
jmacd Sep 6, 2023
a4d467b
versions.yaml
jmacd Sep 6, 2023
e373b9b
Add proportional sampler mode; comment on TODOs; create SamplerMode t…
jmacd Sep 7, 2023
fe6a085
back from internal
jmacd Oct 4, 2023
396efb1
wip
jmacd Oct 4, 2023
36de5dd
fix existing tests
jmacd Oct 6, 2023
f1aa0ad
:wip:
jmacd Oct 12, 2023
700734e
Update for rejection threshold
jmacd Nov 15, 2023
ae50bdd
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Nov 15, 2023
a94b8e7
fix preexisting tests
jmacd Nov 16, 2023
4edcbcb
basic yes/no t-value sampling test
jmacd Nov 16, 2023
53bf119
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Nov 29, 2023
3cdb957
add version for sampling pkg
jmacd Nov 29, 2023
e506847
more testing
jmacd Dec 7, 2023
2cddfeb
add probability to threshold with precision option
jmacd Dec 8, 2023
f69d6ee
ProbabilityToThresholdWithPrecision
jmacd Dec 8, 2023
cc02934
test coverage for equalizing and proportional
jmacd Dec 8, 2023
1eecc4a
config test
jmacd Dec 8, 2023
2159107
comments and notes
jmacd Dec 8, 2023
e0898a6
update README
jmacd Dec 8, 2023
d0991ed
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Jan 10, 2024
a002774
Remove sampling pkg, it is now upstream
jmacd Feb 14, 2024
3a49922
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Feb 28, 2024
fca0184
build w/ new sampling pkg
jmacd Feb 28, 2024
f11e0a4
more test coverage
jmacd Feb 29, 2024
3f495a6
more config tests
jmacd Feb 29, 2024
581095c
test precision underflow
jmacd Mar 1, 2024
7b8fd31
test warning logs
jmacd Mar 1, 2024
1a6be4f
hash_seed fixes
jmacd Mar 1, 2024
712cf17
wip
jmacd Mar 4, 2024
34c0d3b
aip
jmacd Mar 5, 2024
7742668
wip-refactoring
jmacd Mar 13, 2024
8d60168
refactor wip
jmacd Mar 14, 2024
3779caa
cleanup refactor
jmacd Mar 14, 2024
c261ac1
wip
jmacd Mar 14, 2024
34469e4
moving code
jmacd Mar 15, 2024
8dabf47
fix tests; round up small probs to avoid errors
jmacd Mar 15, 2024
d44afb5
preserve legacy behavior
jmacd Mar 15, 2024
1cf9991
logs handled sampling priority differently
jmacd Mar 15, 2024
365d35d
still two errors
jmacd Mar 18, 2024
12a3047
builds
jmacd Mar 19, 2024
8655f42
needs testing
jmacd Mar 19, 2024
468e6c6
fixing tests
jmacd Mar 21, 2024
23b4423
cleanup
jmacd Mar 21, 2024
07841e5
remove strict feature
jmacd Mar 21, 2024
6936bc4
tests fixed
jmacd Mar 21, 2024
c132f4c
wip
jmacd Mar 22, 2024
bd13ac9
typo
jmacd Mar 22, 2024
aa33b1c
more logs tests
jmacd Mar 22, 2024
06556dc
Add more comments
jmacd Mar 22, 2024
a4940e6
update README
jmacd Mar 22, 2024
4f616e9
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 22, 2024
b4ca3aa
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 25, 2024
fdd26ac
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Mar 25, 2024
b2d7504
remove new modes
jmacd Mar 25, 2024
fe20788
strip readme
jmacd Mar 25, 2024
72f03e4
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 12, 2024
7fbe81a
Update post-merge.
jmacd Apr 12, 2024
252c568
restore consistency check
jmacd Apr 12, 2024
f73994a
lint
jmacd Apr 12, 2024
dddf03e
revert some config diffs
jmacd Apr 12, 2024
35f45d1
Restructure readme
jmacd Apr 12, 2024
4282212
remove tests for next PR
jmacd Apr 12, 2024
6162804
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 12, 2024
a0567cc
tidy
jmacd Apr 15, 2024
c51c90e
linkcheck
jmacd Apr 15, 2024
41ffde0
chlog
jmacd Apr 15, 2024
560f1a5
more testing
jmacd Apr 15, 2024
e5e29bc
tidy deps
jmacd Apr 15, 2024
7e48d0f
contribcol
jmacd Apr 16, 2024
e33b28d
goporto
jmacd Apr 16, 2024
cdb359d
crosslink
jmacd Apr 16, 2024
92561e6
more go.mod thisisfun
jmacd Apr 16, 2024
83245cc
again?
jmacd Apr 16, 2024
9fc8c3e
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 17, 2024
ea0773f
crosslink again
jmacd Apr 17, 2024
ab93f2f
Add a replace.
jmacd Apr 17, 2024
7784b71
debug
jmacd Apr 17, 2024
328cd9a
double processor mods
jmacd Apr 17, 2024
a95c83b
make the range test use less memory
jmacd Apr 17, 2024
73588c1
lint (of course)
jmacd Apr 17, 2024
434a560
revert two files
jmacd Apr 17, 2024
8c60fca
Apply suggestions from code review
jmacd Apr 22, 2024
08bf3f5
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 22, 2024
5e0ba11
Merge branch 'jmacd/bigrefactor' of github.com:jmacd/opentelemetry-co…
jmacd Apr 22, 2024
9c1d6e2
clarify logs sampling_priority attribute, fix mistake
jmacd Apr 22, 2024
f5a57a1
document defaults and types
jmacd Apr 22, 2024
8fbf133
fix fail_closed default doc
jmacd Apr 22, 2024
d98ff2f
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 24, 2024
af8f19c
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd Apr 24, 2024
1475569
Re-tidy, re-genotelcontribcol.
jmacd Apr 24, 2024
95f5c13
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd May 2, 2024
b0fd487
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd May 7, 2024
b3f84ac
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd May 8, 2024
594852f
more text on consistency/completeness
jmacd May 8, 2024
7abdf2d
comments/feedback
jmacd May 8, 2024
2012193
Return FailClosed: in tests
jmacd May 8, 2024
d3abdd7
make it debug-level
jmacd May 8, 2024
e331268
fix test
jmacd May 8, 2024
1ff0053
fix test
jmacd May 8, 2024
b937dbe
more tidy
jmacd May 8, 2024
00ab52f
make gotidy missed this :(
jmacd May 8, 2024
d2cde83
remove cfg != nil test
jmacd May 9, 2024
487aad8
tidy the world
jmacd May 9, 2024
1049de3
Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…
jmacd May 10, 2024
21ef16d
ci/cd is so awful
jmacd May 10, 2024
fecd75e
Update processor/probabilisticsamplerprocessor/logsprocessor.go
jmacd May 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/probabilisticsampler_failclosed.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: probabilisticsamplerprocessor

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Adds the `FailClosed` flag to solidify current behavior when randomness source is missing.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [31918]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
3 changes: 3 additions & 0 deletions cmd/configschema/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/internal/kafka v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/internal/sharedcomponent v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/experimentalmetricmetadata v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azure v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/winperfcounters v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/processor/probabilisticsamplerprocessor v0.100.0 // indirect
Expand Down Expand Up @@ -1226,3 +1227,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/connector/graf
replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/sumologicextension => ../../extension/sumologicextension

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/splunkenterprisereceiver => ../../receiver/splunkenterprisereceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../pkg/sampling
1 change: 1 addition & 0 deletions cmd/otelcontribcol/builder-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -472,3 +472,4 @@ replaces:
- github.com/open-telemetry/opentelemetry-collector-contrib/extension/opampcustommessages => ../../extension/opampcustommessages
- github.com/open-telemetry/opentelemetry-collector-contrib/confmap/provider/s3provider => ../../confmap/provider/s3provider
- github.com/open-telemetry/opentelemetry-collector-contrib/confmap/provider/secretsmanagerprovider => ../../confmap/provider/secretsmanagerprovider
- github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../pkg/sampling
3 changes: 3 additions & 0 deletions cmd/otelcontribcol/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -631,6 +631,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azure v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/jaeger v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/loki v0.100.0 // indirect
Expand Down Expand Up @@ -1293,3 +1294,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/opam
replace github.com/open-telemetry/opentelemetry-collector-contrib/confmap/provider/s3provider => ../../confmap/provider/s3provider

replace github.com/open-telemetry/opentelemetry-collector-contrib/confmap/provider/secretsmanagerprovider => ../../confmap/provider/secretsmanagerprovider

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../pkg/sampling
2 changes: 2 additions & 0 deletions connector/datadogconnector/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -332,3 +332,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/stor
replace github.com/openshift/api v3.9.0+incompatible => github.com/openshift/api v0.0.0-20180801171038-322a19404e37

replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor => ../../processor/transformprocessor

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../pkg/sampling
3 changes: 3 additions & 0 deletions exporter/datadogexporter/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/internal/filter v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/prometheus v0.100.0 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
Expand Down Expand Up @@ -427,3 +428,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/tail
replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/storage => ../../extension/storage

replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor => ../../processor/transformprocessor

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../pkg/sampling
2 changes: 2 additions & 0 deletions exporter/datadogexporter/integrationtest/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -341,3 +341,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/prob
replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver => ../../../receiver/prometheusreceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor => ../../../processor/transformprocessor

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ../../../pkg/sampling
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,7 @@ require (
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/pdatautil v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/resourcetotelemetry v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/azure v0.100.0 // indirect
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/jaeger v0.100.0 // indirect
Expand Down Expand Up @@ -1226,3 +1227,5 @@ replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/enco
replace github.com/open-telemetry/opentelemetry-collector-contrib/extension/ackextension => ./extension/ackextension

replace github.com/open-telemetry/opentelemetry-collector-contrib/receiver/splunkenterprisereceiver => ./receiver/splunkenterprisereceiver

replace github.com/open-telemetry/opentelemetry-collector-contrib/pkg/sampling => ./pkg/sampling
190 changes: 150 additions & 40 deletions processor/probabilisticsamplerprocessor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,51 +15,159 @@
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
<!-- end autogenerated section -->

The probabilistic sampler supports two types of sampling for traces:

1. `sampling.priority` [semantic
convention](https://github.com/opentracing/specification/blob/master/semantic_conventions.md#span-tags-table)
as defined by OpenTracing
1. Trace ID hashing

The `sampling.priority` semantic convention takes priority over trace ID hashing. As the name
implies, trace ID hashing samples based on hash values determined by trace IDs. See [Hashing](#hashing) for more information.
The probabilistic sampler processor supports several modes of sampling
for spans and log records. Sampling is performed on a per-request
basis, considering individual items statelessly. For whole trace
sampling, see
[tailsamplingprocessor](../tailsamplingprocessor/README.md).

For trace spans, this sampler supports probabilistic sampling based on
a configured sampling percentage applied to the TraceID. In addition,
the sampler recognizes a `sampling.priority` annotation, which can
force the sampler to apply 0% or 100% sampling.

For log records, this sampler can be configured to use the embedded
TraceID and follow the same logic as applied to spans. When the
TraceID is not defined, the sampler can be configured to apply hashing
to a selected log record attribute. This sampler also supports
sampling priority.

## Consistency guarantee

A consistent probability sampler is a Sampler that supports
independent sampling decisions for each span or log record in a group
(e.g. by TraceID), while maximizing the potential for completeness as
follows.

Consistent probability sampling requires that for any span in a given
trace, if a Sampler with lesser sampling probability selects the span
for sampling, then the span would also be selected by a Sampler
configured with greater sampling probability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also deserve some explanation: what's a sampler here? It wasn't defined yet. Is this about chaining collectors together, where the first samples 100%, and the second 90%, and the assumption is that all those 80% would be within the 90%?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See revised text. 594852f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright -- I think we might want to make it easier to digest in the future, but let's wait for users to provide feedback.


## Completeness property

A trace is complete when all of its members are sampled. A
"sub-trace" is complete when all of its descendents are sampled.

Ordinarily, Trace and Logging SDKs configure parent-based samplers
which decide to sample based on the Context, because it leads to
completeness.

When non-root spans or logs make independent sampling decisions
instead of using the parent-based approach (e.g., using the
`TraceIDRatioBased` sampler for a non-root span), incompleteness may
result, and when spans and log records are independently sampled in a
processor, as by this component, the same potential for completeness
arises. The consistency guarantee helps minimimize this issue.

Consistent probability samplers can be safely used with a mixture of
probabilities and preserve sub-trace completeness, provided that child
spans and log records are sampled with probability greater than or
equal to the parent context.

Using 1%, 10% and 50% probabilities for example, in a consistent
probability scheme the 50% sampler must sample when the 10% sampler
does, and the 10% sampler must sample when the 1% sampler does. A
three-tier system could be configured with 1% sampling in the first
tier, 10% sampling in the second tier, and 50% sampling in the bottom
tier. In this configuration, 1% of traces will be complete, 10% of
traces will be sub-trace complete at the second tier, and 50% of
traces will be sub-trace complete at the third tier thanks to the
consistency property.

These guidelines should be considered when deploying multiple
collectors with different sampling probabilities in a system. For
example, a collector serving frontend servers can be configured with
smaller sampling probability than a collector serving backend servers,
without breaking sub-trace completeness.

## Sampling randomness

To achieve consistency, sampling randomness is taken from a
deterministic aspect of the input data. For traces pipelines, the
source of randomness is always the TraceID. For logs pipelines, the
source of randomness can be the TraceID or another log record
attribute, if configured.

For log records, the `attribute_source` and `from_attribute` fields determine the
source of randomness used for log records. When `attribute_source` is
set to `traceID`, the TraceID will be used. When `attribute_source`
is set to `record` or the TraceID field is absent, the value of
`from_attribute` is taken as the source of randomness (if configured).

## Sampling priority

The sampling priority mechanism is an override, which takes precedence
over the probabilistic decision in all modes.

🛑 Compatibility note: Logs and Traces have different behavior.

In traces pipelines, when the priority attribute has value 0, the
configured probability will by modified to 0% and the item will not
pass the sampler. When the priority attribute is non-zero the
configured probability will be set to 100%. The sampling priority
attribute is not configurable, and is called `sampling.priority`.

In logs pipelines, when the priority attribute has value 0, the
configured probability will by modified to 0%, and the item will not
pass the sampler. Otherwise, the logs sampling priority attribute is
interpreted as a percentage, with values >= 100 equal to 100%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this mismatch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was very surprised by this. I have preserved the inconsistency and I am not sure what we should do about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could unify the solutions by varying behavior according to the type of the attribute. If numeric, it's the priority, and if a string, it's the name of the numeric attribute containing the priority.

Personally, I'd prefer to choose one for the long term, something like:

  • State that the current traces behavior for sampling.priority is deprecated and that the logs behavior is desired.
  • For some period of time (probably a long time) state that when traces are configured with a numeric value for sampling.priority, it is interpreted as the configured probability, but that a string value will be treated as the name of the attribute to be used for priority.

It would be just as valid to do it the other way, if we preferred less configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the perfect opportunity for this change, but not necessarily part of this PR.

sampling. The logs sampling priority attribute is configured via
`sampling_priority`.

## Sampling algorithm

### Hash seed

The hash seed method uses the FNV hash function applied to either a
Trace ID (spans, log records), or to the value of a specified
attribute (only logs). The hashed value, presumed to be random, is
compared against a threshold value that corresponds with the sampling
percentage.

This mode requires configuring the `hash_seed` field. This mode is
enabled when the `hash_seed` field is not zero, or when log records
are sampled with `attribute_source` is set to `record`.

In order for hashing to be consistent, all collectors for a given tier
(e.g. behind the same load balancer) must have the same
`hash_seed`. It is also possible to leverage a different `hash_seed`
at different collector tiers to support additional sampling
requirements.

This mode uses 14 bits of sampling precision.

### Error handling

This processor considers it an error when the arriving data has no
randomess. This includes conditions where the TraceID field is
invalid (16 zero bytes) and where the log record attribute source has
zero bytes of information.

By default, when there are errors determining sampling-related
information from an item of telemetry, the data will be refused. This
behavior can be changed by setting the `fail_closed` property to
false, in which case erroneous data will pass through the processor.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

## Configuration

The following configuration options can be modified:
- `hash_seed` (no default): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `sampling_percentage` (default = 0): Percentage at which traces are sampled; >= 100 samples all traces

Examples:
- `sampling_percentage` (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items.
- `hash_seed` (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `fail_closed` (boolean, optional, default = true): Whether to reject items with sampling-related errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a change in behavior from the current implementation?

Copy link
Contributor Author

@jmacd jmacd May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation has an odd inconsistency which this is meant to resolve in a user-configurable way. In cases where there is no randomness, the sampling decision is fixed and therefore depends on the configured probability. For some probabilities, the failure would be open, and for other probabilities, the failure would be closed--depending on the hash seed.

I added this test to demonstrate:

func TestEmptyHashFunction(t *testing.T) {
	// With zero bytes of randomness, seed=0:
	hashed32 := computeHash([]byte{}, 0)
	hashed := uint64(hashed32 & bitMaskHashBuckets)
	require.Equal(t, uint64(0x3515), hashed)
	require.InDelta(t, 0.829, float64(hashed)/float64(numHashBuckets), 0.001)

	// With 16 bytes of 0s, seed=0:
	var b [16]byte
	hashed32 = computeHash(b[:], 0)
	hashed = uint64(hashed32 & bitMaskHashBuckets)
	require.Equal(t, uint64(0x2455), hashed)
	require.InDelta(t, 0.568, float64(hashed)/float64(numHashBuckets), 0.001)
}

So for empty TraceIDs (16 bytes of 0s) are sampled at 56.8% and above, and logs with missing attribute values (0 bytes) are sampled at 82.9% and above. Since I expect most users are configuring probabilities at 50% or below, most users would never see these items of telemetry and never knew there was a problem.

This is how I justify the decision to use FailClosed=true by default, because users would have had to be sampling above 56% to see these records before, with the default seed. I am open to both default values, but slightly prefer FailClosed=true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


```yaml
processors:
probabilistic_sampler:
hash_seed: 22
sampling_percentage: 15.3
```
### Logs-specific configuration

The probabilistic sampler supports sampling logs according to their trace ID, or by a specific log record attribute.

The probabilistic sampler optionally may use a `hash_seed` to compute the hash of a log record.
This sampler samples based on hash values determined by log records. See [Hashing](#hashing) for more information.

The following configuration options can be modified:
- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `sampling_percentage` (required): Percentage at which logs are sampled; >= 100 samples all logs, 0 rejects all logs.
- `attribute_source` (default = traceID, optional): defines where to look for the attribute in from_attribute. The allowed values are `traceID` or `record`.
- `from_attribute` (default = null, optional): The optional name of a log record attribute used for sampling purposes, such as a unique log record ID. The value of the attribute is only used if the trace ID is absent or if `attribute_source` is set to `record`.
- `sampling_priority` (default = null, optional): The optional name of a log record attribute used to set a different sampling priority from the `sampling_percentage` setting. 0 means to never sample the log record, and >= 100 means to always sample the log record.

## Hashing

In order for hashing to work, all collectors for a given tier (e.g. behind the same load balancer)
must have the same `hash_seed`. It is also possible to leverage a different `hash_seed` at
different collector tiers to support additional sampling requirements. Please refer to
[config.go](./config.go) for the config spec.
- `attribute_source` (string, optional, default = "traceID"): defines where to look for the attribute in from_attribute. The allowed values are `traceID` or `record`.
- `from_attribute` (string, optional, default = ""): The name of a log record attribute used for sampling purposes, such as a unique log record ID. The value of the attribute is only used if the trace ID is absent or if `attribute_source` is set to `record`.
- `sampling_priority` (string, optional, default = ""): The name of a log record attribute used to set a different sampling priority from the `sampling_percentage` setting. 0 means to never sample the log record, and >= 100 means to always sample the log record.

Examples:

Sample 15% of the logs:
Sample 15% of log records according to trace ID using the OpenTelemetry
specification.

```yaml
processors:
probabilistic_sampler:
Expand All @@ -76,7 +184,8 @@ processors:
from_attribute: logID # value is required if the source is not traceID
```

Sample logs according to the attribute `priority`:
Give sampling priority to log records according to the attribute named
`priority`:

```yaml
processors:
Expand All @@ -85,6 +194,7 @@ processors:
sampling_priority: priority
```

## Detailed examples

Refer to [config.yaml](./testdata/config.yaml) for detailed
examples on using the processor.
Refer to [config.yaml](./testdata/config.yaml) for detailed examples
on using the processor.
10 changes: 10 additions & 0 deletions processor/probabilisticsamplerprocessor/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,16 @@ type Config struct {
// different sampling rates, configuring different seeds avoids that.
HashSeed uint32 `mapstructure:"hash_seed"`

// FailClosed indicates to not sample data (the processor will
// fail "closed") in case of error, such as failure to parse
// the tracestate field or missing the randomness attribute.
//
// By default, failure cases are sampled (the processor is
// fails "open"). Sampling priority-based decisions are made after
// FailClosed is processed, making it possible to sample
// despite errors using priority.
FailClosed bool `mapstructure:"fail_closed"`
jmacd marked this conversation as resolved.
Show resolved Hide resolved

// AttributeSource (logs only) defines where to look for the attribute in from_attribute. The allowed values are
// `traceID` or `record`. Default is `traceID`.
AttributeSource `mapstructure:"attribute_source"`
Expand Down
Loading