Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for NonAbsolute Measurement MaxSumCount #335

Merged
merged 13 commits into from
Nov 25, 2019

Conversation

evantorrie
Copy link
Contributor

Previously, the MaxSumCount aggregator failed to work correctly with
negative numbers (e.g. MeasureKind Alternate()==true).

This adds various tests for MaxSumCount which vary the alternate param together with the sign of the random numbers generated in the test.

Previously, the MaxSumCount aggregator failed to work correctly with
negative numbers (e.g. MeasureKind Alternate()==true).
@evantorrie evantorrie changed the title Implement support for NonAbsolute Measurement MaxSumCount WIP: Implement support for NonAbsolute Measurement MaxSumCount Nov 20, 2019
Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @evantorrie. I'm glad you took this up. I had filed #319 which is related. It would be great if you'd like to take that up as well (in a second PR)! The task involves

  1. renaming this aggregator to MinMaxSumCount
  2. entering a Min interface in sdk/export/metric/aggregator, implement it here (and ideally in the others--array and ddsketch also support Min).
  3. in the stdout exporter, add the min output (and test for the "no data" condition discussed here).

sdk/metric/aggregator/maxsumcount/msc.go Outdated Show resolved Hide resolved
@@ -96,6 +98,9 @@ func (c *Aggregator) Update(_ context.Context, number core.Number, desc *export.
c.current.count.AddUint64Atomic(1)
c.current.sum.AddNumberAtomic(kind, number)

c.first.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the above suggestion, we can avoid the sync.Once here--instead we'll just treat min and max the same way.

The Max() and Min() functions will have to recognize the minimum/maximum values as special cases. This means they'll now be able to detect a "no data" condition, which will be possible in certain race conditions within the SDK.

They can return an error to indicate that no data was found for Min or Max, using the existing declared errors in the export aggregator package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @evantorrie. I'm glad you took this up. I had filed #319 which is related. It would be great if you'd like to take that up as well (in a second PR)! The task involves

Yes, this PR started off as an implementation for #319, but I then found the issue with negative values and the existing Max() implementation, so decided to try to fix that first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Max() and Min() functions will have to recognize the minimum/maximum values as special cases. This means they'll now be able to detect a "no data" condition, which will be possible in certain race conditions within the SDK.

@jmacd The issue with this is that when an Aggregator is first created, it does not have any notion of what type of core.Number it will be aggregating. As I understand it, it does not know until the first time that agg.Update() is called.

If we do not set min/max via the sync.Once() and instead leave them as null-initialized values per the existing New() method, then the first period during which numbers are being aggregated may not give the correct answer. That's because number.CompareNumber(kind, c.current.max.AsNumberAtomic()) doesn't work correctly for negative values when c.current.max == 0 (and more obviously, a null-initialized c.current.min will never compare greater than an uint64 metric).

As I see it, possible solutions are:

  1. Pass the type of number being aggregated down to each Aggregator's New() function. The only place an Aggregator is created in the SDK is when the AggregationSelector is called by the batcher, and the AggregationSelector knows which type of metric is going to be aggregated. Passing this to the Aggregator's New() would allow it to set the initialized "current.min/max" correctly based on the number kind.

  2. Avoid the issue by stating that Min()/Max() are only valid after the first Checkpoint() has been taken for an aggregator.

I prefer approach 1., but I don't know whether there are existing constraints or other intended purposes where an Aggregator may need to be created without knowing the kind of metric that it will later aggregate.

@jmacd
Copy link
Contributor

jmacd commented Nov 20, 2019 via email

Allows it to set the initial state (current.max) to the correct value
based on the NumberKind.
This is analagous to the DDSketch New() constructor
@evantorrie
Copy link
Contributor Author

Should be ready to review now.

@evantorrie evantorrie changed the title WIP: Implement support for NonAbsolute Measurement MaxSumCount Implement support for NonAbsolute Measurement MaxSumCount Nov 21, 2019
An empty checkpoint should have Sum() == 0, Count() == 0 and Max()
still equal to the numberKind.Minimum()
Remove TODO from stdout exporter to ensure that if a maxsumcount or
ddsketch aggregator returns ErrEmptyDataSet from Max(), then the
entire record will be skipped by the exporter.

Added tests to ensure the exporter doesn't send any updates for
EmptyDataSet checkpoints - for both ddsketch and maxsumcount.
@evantorrie
Copy link
Contributor Author

@jmacd This seems to experience intermittent crashes during the x86-32 (386) CI build.. backtrace shows a panic in

runtime/internal/atomic.Xadd64(0x56cb2304, 0x1, 0x0, 0x8147fb0, 0x31)
	/usr/local/go/src/runtime/internal/atomic/asm_386.s:105 +0xc
go.opentelemetry.io/otel/api/core.(*Number).AddUint64Atomic(...)
	/home/circleci/project/api/core/number.go:442
go.opentelemetry.io/otel/sdk/metric/aggregator/maxsumcount.(*Aggregator).Update(0x56cb2300, 0x836a880, 0x56cb0000, 0x31, 0x0, 0x56ca2390, 0x1, 0x0)
	/home/circleci/project/sdk/metric/aggregator/maxsumcount/msc.go:110 +0x47

Looking at the sync.atomic docs, there is this warning from rsc: "On ARM, x86-32, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned."

Have you seen/run into this before?

@jmacd
Copy link
Contributor

jmacd commented Nov 21, 2019

I glanced at the test failures, and it looks like the root cause is a nil pointer exception.

On 32-bit architectures, Go only guarantees that primitive
values are aligned to a 4 byte boundary. Atomic operations on 32-bit
machines require 8-byte alignment.

See golang/go#599
@evantorrie
Copy link
Contributor Author

It was a 32-bit vs 64-bit layout issue - now fixed.

See https://go101.org/article/memory-layout.html for more gory details

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you figured out the problem about alignment! (I learned some things.)

I'm inclined to (in another PR) add some tests based on unsafe.Alignof() to quickly avoid such confusion in the future. (I will file an issue about this.)

sdk/metric/aggregator/maxsumcount/msc.go Outdated Show resolved Hide resolved
func (c *Aggregator) Max() (core.Number, error) {
if c.checkpoint == unsetMaxSumCount(c.kind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the MaxSumCount aggregator does not take a lock during Update, each field of the state will be independently modified, and a race with Update could happen such that the count and sum are non-zero but the max field hasn't been set. So, I think you should test c.checkpoint.max == c.kind.Minimum() here.

(Note: this means, now, that the minimum and maximum values are considered invalid. This might be worth adding comments about, i.e., that if you record a measure value which is the Minimum possible value, that the Max() function will return a no-data error. I don't think this will be a problem, but it would probably become a problem if this library supported unsigned integer instruments. Then a value of Record(0) would lead to a no-data error in Max(). If we did ever support unsigned instruments, I we'd probably want to offset by one or something. Nevermind.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Minimum() for the default uninitialized Maximum value means
that in the unlikely condition that every recorded value for a measure
is equal to the same NumberKind.Minimum(), then the aggregator's Max()
will return ErrEmptyDataSet
@jmacd
Copy link
Contributor

jmacd commented Nov 22, 2019

@paivagustavo will you please review?

@rghetia rghetia merged commit 9a2484c into open-telemetry:master Nov 25, 2019
@evantorrie evantorrie deleted the nonabsolutemsc branch June 23, 2020 19:07
hstan referenced this pull request in hstan/opentelemetry-go Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants