Add support for untyped metrics from Ops Agent #668

damemi · 2023-07-10T19:51:57Z

This adds functionality to automatically parse the attribute set by the Ops Agent's prometheus receiver to indicate untyped prometheus metrics (GoogleCloudPlatform/opentelemetry-operations-collector#170). When this attribute is found in a Gauge metric, the time series will be double-exported as a Gauge and a Cumulative. The special attribute will be dropped from metric labels.

This behavior is ~~on by default~~ behind a featuregate gcp.untyped_double_export and is only meant for internal use until an upstream approach to untyped metrics is implemented.

dashpole · 2023-07-10T19:53:43Z

This behavior is on by default and is only meant for internal use until an upstream approach to untyped metrics is implemented.

Can we put this behind an alpha feature gate? Otherwise, I would considering it a breaking change when we remove the behavior

codecov · 2023-07-10T19:58:07Z

Codecov Report

Merging #668 (505a1cf) into main (a19d9ea) will increase coverage by 0.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #668      +/-   ##
==========================================
+ Coverage   68.56%   68.99%   +0.42%     
==========================================
  Files          36       36              
  Lines        4559     4647      +88     
==========================================
+ Hits         3126     3206      +80     
- Misses       1280     1288       +8     
  Partials      153      153

Impacted Files	Coverage Δ
...er/collector/integrationtest/testcases/testcase.go	`81.51% <ø> (ø)`
...collector/googlemanagedprometheus/extra_metrics.go	`78.72% <100.00%> (+7.09%)`	⬆️
...porter/collector/googlemanagedprometheus/naming.go	`100.00% <100.00%> (ø)`
...tor/integrationtest/testcases/testcases_metrics.go	`100.00% <100.00%> (ø)`

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

damemi · 2023-07-10T20:11:46Z

This behavior is on by default and is only meant for internal use until an upstream approach to untyped metrics is implemented.

Can we put this behind an alpha feature gate? Otherwise, I would considering it a breaking change when we remove the behavior

We can, but the mechanism (the attribute) isn't user-facing, so I think changing it wouldn't be breaking. Of course there's nothing to stop users from adding this attribute themselves anyway and hacking it.

dashpole · 2023-07-11T19:50:19Z

Should this be specific to the GMP Exporter? I'm OK with this version being in the googlecloud exporter, but I think we want it in GMP parts eventually.

exporter/collector/metrics.go

dashpole · 2023-07-11T19:55:48Z

exporter/collector/metrics.go

+					Value: value,
+				}},
+				Metric: &metricpb.Metric{
+					Type: t,


How is the GMP suffixing handled?

Good catch, it looks like GMP uses a special unknown suffix when it duplicates the metric. Maybe we should change our metric name function to check for this label

Note that it has different suffixes for the gauge portion of the unknown metric (unknown and the counter portion of the unknown metric (unknown:counter).

I think I'm going to refactor this, a better approach would be to copy the untyped Gauge pmetric entry into an extra Sum metric, and let the existing path after that handle everything. Then GetMetricName() can handle both.

I think using an actual Sum from pdata will also make it easier to handle @ridwanmsharif's question #668 (comment). @dashpole wdyt?

Works for me, but I'm not convinced the logic should live in the base exporter. If someone used the prometheus receiver with our base exporter, they would get duplicate timeseries errors. We need the GMP logic of adding suffixes to prevent duplicate timeseries problems.

We can at least move the gate to the GMP exporter so the logic is only active there. Not sure if we could move the logic itself. Maybe through the "extra metrics" approach we did for target and scope info metrics actually

I was able to move everything using the extra metrics approach. Ptal, thanks!

ridwanmsharif · 2023-07-12T19:03:26Z

exporter/collector/metrics.go

+				ValueType:  valueType,
+				Points: []*monitoringpb.Point{{
+					Interval: &monitoringpb.TimeInterval{
+						EndTime: timestamppb.New(point.Timestamp().AsTime()),


Just wanted to confirm this, but not setting the start time will make this normalize the same way GMP does with these counters?

That is, when the value goes down, treat it as a reset?

I changed this, but it's still doing basically the same thing in the new code so to answer your question: yes it should. This is the normalization function:

opentelemetry-operations-go/exporter/collector/internal/normalization/standard_normalizer.go

Line 261 in a19d9ea

func (s *standardNormalizer) NormalizeNumberDataPoint(point pmetric.NumberDataPoint, identifier string) (pmetric.NumberDataPoint, bool) {

(called from here on a Sum, which the updates I just pushed use for the Cumulative instead of a Gauge)

You can see this now in the fixtures, where I had to add 2 new untyped input data points just to get one output unknown cumulative. This is because the first data point gets normalized as a reset point and dropped by our exporter.

As a side note, I had to update our fixture generators to test this because they currently normalize all timestamps to be rightNow() (in order to work with testing against the actual GCM api). But I needed to bypass that for this test to get the cumulative normalization to show up.

damemi · 2023-07-13T20:59:08Z

~~Collector tests are failing from something in the "skip timestamp update" hack I put in... maybe I can't do that~~ Edit: I wasn't passing the skip flag to the collector tests. Updating this seems to have fixed it

damemi · 2023-07-14T16:02:25Z

...ctor/integrationtest/testdata/fixtures/metrics/google_managed_prometheus_untyped_expect.json

+        },
+        {
+          "metric": {
+            "type": "prometheus.googleapis.com/fake_untyped_metric_total/unknown:counter",


here is the expected counter... of note:

_total is appended (I think this comes from the upstream prom normalization that's currently on be default)

2 data points for fake_untyped_metric as a gauge, because there are 2 input points (first Sum is dropped for normalization)

I think this comes from the upstream prom normalization that's currently on be default

Should we update our collector dependency? I don't think that is supposed to be the case anymore.

Or we can leave it how it is, and update later

I did a test bump on this branch and re-ran the fixture and the _total/units suffixes are dropped in the new version. I'll open a separate bump PR after I merge this

damemi · 2023-07-14T16:04:27Z

...r/collector/integrationtest/testdata/fixtures/metrics/untyped_prometheus_metrics_expect.json

+            "type": "workload.googleapis.com/fake_untyped_metric",
+            "labels": {
+              "ex_com_lemons_untyped": "13",
+              "prometheus_untyped_metric": "true",


this is the fixture for testing the featuregate. this label would be dropped in the gmp exporter if the gate was enabled

…send 1 data point

…ase GCM

damemi · 2023-07-14T20:11:47Z

Final update now that all tests are green: using 2 gauge entries to get the normalized cumulative didn't work with the actual GCM integration test, because the 2nd gauge caused a duplicate timeseries error (see failure here). Added 7830343 to disable cumulative normalization, use 1 gauge entry, and remove the "skip timestamp update" option I added earlier.

However the lack of startTimestamp on the input gauge caused it to be initialized to the 0 time by this line, which caused an invalid timeseries error from GCM (see that failure here). To fix that I pushed 505a1cf to add a startTimestamp to the input gauge and more importantly copy that startTimestamp to the new point. I think this is valid for use cases where untyped metrics might want to manually register a reset point.

Now that this is green, I'm merging. Will open another issue to check on our other cumulative tests to make sure we're not dropping points in those.

Add support for untyped metrics from Ops Agent

17c9fb6

damemi requested a review from a team as a code owner July 10, 2023 19:51

damemi force-pushed the untyped-metrics-ops branch from dc07395 to 678526b Compare July 11, 2023 19:41

Wrap double export in feature gate

d8d80d0

damemi force-pushed the untyped-metrics-ops branch from 678526b to d8d80d0 Compare July 11, 2023 20:29

dashpole reviewed Jul 11, 2023

View reviewed changes

ridwanmsharif reviewed Jul 12, 2023

View reviewed changes

damemi added 2 commits July 13, 2023 20:34

Move untyped metrics logic to GMP exporter

c77ec10

Update tests

4a74e89

damemi added 2 commits July 14, 2023 14:16

Pass skip timestamp update to collector test

b9e86fd

more tests

34f6cf7

damemi commented Jul 14, 2023

View reviewed changes

dashpole approved these changes Jul 14, 2023

View reviewed changes

damemi added 2 commits July 14, 2023 18:55

Disable cumulative normalization for untyped test so we only have to …

7830343

…send 1 data point

Copy starttimestamp to new data point and add to test fixture to appe…

505a1cf

…ase GCM

damemi merged commit 0c97ee5 into GoogleCloudPlatform:main Jul 14, 2023

This was referenced Jul 14, 2023

Verify cumulative tests/fixtures are being normalized as expected #671

Open

Prepare release v0.41.0/1.17.0 #673

Merged

mx-psi mentioned this pull request Nov 13, 2023

flag includes invalid characters open-telemetry/opentelemetry-collector-contrib#29158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for untyped metrics from Ops Agent #668

Add support for untyped metrics from Ops Agent #668

damemi commented Jul 10, 2023 •

edited

Loading

dashpole commented Jul 10, 2023

codecov bot commented Jul 10, 2023 •

edited

Loading

damemi commented Jul 10, 2023

dashpole commented Jul 11, 2023

dashpole Jul 11, 2023

damemi Jul 12, 2023

dashpole Jul 12, 2023

damemi Jul 12, 2023 •

edited

Loading

dashpole Jul 12, 2023

damemi Jul 12, 2023

damemi Jul 13, 2023

ridwanmsharif Jul 12, 2023

damemi Jul 13, 2023 •

edited

Loading

damemi commented Jul 13, 2023 •

edited

Loading

damemi Jul 14, 2023

dashpole Jul 14, 2023

dashpole Jul 14, 2023

damemi Jul 14, 2023

damemi Jul 14, 2023

damemi commented Jul 14, 2023

Add support for untyped metrics from Ops Agent #668

Add support for untyped metrics from Ops Agent #668

Conversation

damemi commented Jul 10, 2023 • edited Loading

dashpole commented Jul 10, 2023

codecov bot commented Jul 10, 2023 • edited Loading

Codecov Report

damemi commented Jul 10, 2023

dashpole commented Jul 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

damemi commented Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

damemi commented Jul 14, 2023

damemi commented Jul 10, 2023 •

edited

Loading

codecov bot commented Jul 10, 2023 •

edited

Loading

damemi Jul 12, 2023 •

edited

Loading

damemi Jul 13, 2023 •

edited

Loading

damemi commented Jul 13, 2023 •

edited

Loading