Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update probabilistic sampler processor with OTEP 235 support #31918

Closed
jmacd opened this issue Mar 22, 2024 · 1 comment · Fixed by #31894
Closed

Update probabilistic sampler processor with OTEP 235 support #31918

jmacd opened this issue Mar 22, 2024 · 1 comment · Fixed by #31894
Assignees
Labels
enhancement New feature or request processor/probabilisticsampler Probabilistic Sampler processor

Comments

@jmacd
Copy link
Contributor

jmacd commented Mar 22, 2024

Component(s)

processor/probabilisticsampler

Is your feature request related to a problem? Please describe.

OTEP 235 specifies how to encode randomness and threshold information for consistent probability sampling. Add support for this specification.

Describe the solution you'd like

There are two modes of sampling implied by the new specification.

  • Proportional sampling based on 56 bits of randomness
  • Equalizing sampling based on 56 bits of randomness
    This compared with the existing hash-based solution, which uses 14 bits of seed hash value.

Describe alternatives you've considered

In #31894 I've demonstrated the end-to-end change I would like to see. I propose to factor it into 3 parts.

  1. Changes in pkg/sampling. In New component: pkg/sampling #29738 a package of code was contributed meant for this work, stemming from Probabilistic sampler processor based on draft t-value/r-value encoding #24811. I have found a few minor changes that are required/nice-to-have in this package, and will separate them.
  2. Changes in probabilisticsampler before adding new modes. This is a major refactoring project, but it does not change functionality. Introduces the FailClosed feature, which gives the user control over error handling. Note the refactored code shares almost all of its logic between the two code paths, unlike the existing.
  3. Adds two new probability sampler modes. This will complete the project.

Additional context

Noticed while testing #31894 a couple of major/minor inconsistencies, documented them in the README.

  1. Some code paths would apply the hash function to an empty input, such as an empty trace ID or an empty attribute value. These would get a fixed, hard-coded value which sampled in 89% of cases. In the new code, FailClosed determines the outcome when randomness is missing, among other potential error cases.
  2. The logs "sampling priority" mechanism is very different from the traces mechanism. I would like to reconcile this, but won't do so under this issue.
  3. Since the default configuration includes hash_seed: 0, I propose changing the default in this code to use "proportional" by default, instead of "hash_seed", when there is not an explicit hash seed set, when trace IDs are used. This allows the new OTel specification to be used in common cases.
@jmacd jmacd added enhancement New feature or request needs triage New item requiring triage labels Mar 22, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the processor/probabilisticsampler Probabilistic Sampler processor label Mar 22, 2024
@Frapschen Frapschen removed the needs triage New item requiring triage label Mar 25, 2024
MovieStoreGuy added a commit that referenced this issue Apr 9, 2024
**Description:**

Several usability issues were ironed out while working on #31894. This
PR is the pkg/sampling changes from that PR.

Highlights:
- Adds `NeverSampleThreshold` value, which is the exclusive
upper-boundary of threshold values. This makes negative sampling
decisions easier to manage, as shown in #31894.
- Adds `AllProbabilitiesRandomness` value, which is the inclusive
upper-boundary of Randomness values. This makes error handling more
natural as shown in #31894. All thresholds except `NeverSampleThreshold`
will be sampled at `AllProbabilitiesRandomness`.
- Adds `UnsignedToThreshold` constructor for explicit threshold
construction. This is useful in the case of #31894 because it constructs
a 14-bit threshold value.
- Adds `UnsignedToRandomness` constructor for explicit randomness
construction. This is useful in the case of #31894 because it constructs
randomness values from log records w/o use of TraceState.
- Removes a parameter from `UpdateTValueWithSampling` to avoid the
potential for an inconsistent update through mis-use (as identified in
#31894, there is less optimization potential because sampling.threshold
modifies thresholds in all modes).
- Eliminates the `ErrPrecisionUnderflow` error condition and
automatically corrects the problem by extending precision near 0.0 and
1.0 where there are obligatory leading `f` or `0` digits.

**Link to tracking Issue:** #31918 

**Testing:** New tests added for coverage.

**Documentation:** New comments to explain.

---------

Co-authored-by: Sean Marciniak <[email protected]>
jpkrohling pushed a commit that referenced this issue Apr 19, 2024
…32360)

**Description:** Adds new hard-coded test of the TraceID hashing
function in the probabilisticsamplerprocessor. This will ensure that
changes do not inadvertently modify the hashing function or associated
logic for spans. Note that the Logs sampler logic includes a test with
exact counts of sampled log records, which serves the same purpose.

**Link to tracking Issue:** #31918 

**Testing:** This is a test added ahead of
#31946,
which refactors the hash-based decision but should not change its
results.

**Documentation:** n/a
jpkrohling pushed a commit that referenced this issue May 15, 2024
…ation, prepare for OTEP 235 support (#31946)

**Description:**

Refactors the probabilistic sampling processor to prepare it for more
OTEP 235 support.

This clarifies existing inconsistencies between tracing and logging
samplers, see the updated README. The tracing priority mechanism applies
a 0% or 100% sampling override (e.g., "1" implies 100% sampling),
whereas the logging sampling priority mechanism supports
variable-probability override (e.g., "1" implies 1% sampling).

This pins down cases where no randomness is available, and organizes the
code to improve readability. A new type called `randomnessNamer` carries
the randomness information (from the sampling pacakge) and a name of the
policy that derived it. When sampling priority causes the effective
sampling probability to change, the value "sampling.priority" replaces
the source of randomness, which is currently limited to "trace_id_hash"
or the name of the randomess-source attribute, for logs.

While working on #31894, I discovered that some inputs fall through to
the hash function with zero bytes of input randomness. The hash
function, computed on an empty input (for logs) or on 16 bytes of zeros
(which OTel calls an invalid trace ID), would produce a fixed random
value. So, for example, when logs are sampled and there is no TraceID
and there is no randomness attribute value, the result will be sampled
at approximately 82.9% and above.

In the refactored code, an error is returned when there is no input
randomness. A new boolean configuration field determines the outcome
when there is an error extracting randomness from an item of telemetry.
By default, items of telemetry with errors will not pass through the
sampler. When `FailClosed` is set to false, items of telemetry with
errors will pass through the sampler.

The original hash function, which uses 14 bits of information, is
structured as an "acceptance threshold", ultimately the test for
sampling translated into a positive decision when `Randomness <
AcceptThreshold`. In the OTEP 235 scheme, thresholds are rejection
thresholds--this PR modifies the original 14-bit accept threshold into a
56-bit reject threshold, using Threshold and Randomness types from the
sampling package. Reframed in this way, in the subsequent PR (i.e.,
#31894) the effective sampling probability will be seamlessly conveyed
using OTEP 235 semantic conventions.

Note, both traces and logs processors are now reduced to a function like
this:

```
				return commonSamplingLogic(
					ctx,
					l,
					lsp.sampler,
					lsp.failClosed,
					lsp.sampler.randomnessFromLogRecord,
					lsp.priorityFunc,
					"logs sampler",
					lsp.logger,
				)
```

which is a generic function that handles the common logic on a per-item
basis and ends in a single metric event. This structure makes it clear
how traces and logs are processed differently and have different
prioritization schemes, currently. This structure also makes it easy to
introduce new sampler modes, as shown in #31894. After this and #31940
merge, the changes in #31894 will be relatively simple to review as the
third part in a series.

**Link to tracking Issue:**

Depends on #31940.
Part of #31918.

**Testing:** Added. Existing tests already cover the exact random
behavior of the current hashing mechanism. Even more testing will be
introduced with the last step of this series. Note that
#32360
is added ahead of this test to ensure refactoring does not change
results.

**Documentation:** Added.

---------

Co-authored-by: Kent Quirk <[email protected]>
jpkrohling added a commit that referenced this issue Jun 13, 2024
…rt OTEP 235) (#31894)

**Description:** Creates new sampler modes named "equalizing" and
"proportional". Preserves existing functionality under the mode named
"hash_seed".

Fixes #31918

This is the final step in a sequence, the whole of this work was
factored into 3+ PRs, including the new `pkg/sampling` and the previous
step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs
with Collectors in a consistent way.

The existing hash_seed mode is also a consistent sampling mode, which
makes it possible to have a 1:1 mapping between its decisions and the
OTEP 235 randomness and threshold values. Specifically, the 14-bit hash
value and sampling probability are mapped into 56-bit R-value and
T-value encodings, so that all sampling decisions in all modes include
threshold information.

This implements the semantic conventions of
open-telemetry/semantic-conventions#793, namely
the `sampling.randomness` and `sampling.threshold` attributes used for
logs where there is no tracestate.

The default sampling mode remains HashSeed. We consider a future change
of default to Proportional to be desirable, because:

1. Sampling probability is the same, only the hashing algorithm changes
2. Proportional respects and preserves information about earlier
sampling decisions, which HashSeed can't do, so it has greater
interoperability with OTel SDKs which may also adopt OTEP 235 samplers.

**Link to tracking Issue:** 

Draft for
open-telemetry/opentelemetry-specification#3602.
Previously
#24811,
see also open-telemetry/oteps#235
Part of #29738

**Testing:** New testing has been added.

**Documentation:** ✅

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
codeboten pushed a commit to open-telemetry/opentelemetry-collector that referenced this issue Jun 19, 2024
#### Description
Span.TraceState() is not printed, but is a part of the OTel span data
model.

#### Link to tracking issue
Part of open-telemetry/opentelemetry-collector-contrib#31918

#### Testing
✅
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/probabilisticsampler Probabilistic Sampler processor
Projects
None yet
2 participants