14313-benchmark-format: add unit metadata

The predefined keys differ slightly from what I proposed in golang/go#43744. I tried specifying {higher,lower}={better,worse} like I originally proposed and it just got really messy. Turning it around to better={higher,lower} means its somewhat backwards from what you might expect from English phrasing, but it lets us use a single key because I don't think anyone is going to accidentally write worse={higher,lower}, and this avoids any annoying questions about what happens if a user specifies both "higher" and "lower". For golang/go#43744. Change-Id: I895914b179c291003e76f897cabbcbdb2381f163 Reviewed-on: https://go-review.googlesource.com/c/proposal/+/357530 Reviewed-by: Michael Knyszek <[email protected]>
golang · Oct 29, 2024 · 986bcc1 · 986bcc1
1 parent 1dd567d
commit 986bcc1
Showing 1 changed file with 46 additions and 3 deletions.
diff --git a/design/14313-benchmark-format.md b/design/14313-benchmark-format.md
@@ -89,7 +89,7 @@ the need to process custom output formats in future benchmarks.
 ## Proposal
 
 A Go benchmark data file is a UTF-8 textual file consisting of a sequence of lines.
-Configuration lines and benchmark result lines, described below,
+Configuration lines, benchmark result lines, and unit metadata lines, described below,
 have semantic meaning in the reporting of benchmark results.
 
 All other lines in the data file, including but not limited to
@@ -150,7 +150,7 @@ In the example, the CPU cost is reported per-operation and the
 throughput is reported per-second; neither is a total that
 depends on the number of iterations.
 
-### Value Units
+#### Value Units
 
 A value's unit string is expected to specify not only the measurement unit
 but also, as needed, a description of what is being measured.
@@ -167,7 +167,7 @@ and rescale known measurement units.
 For example, consistently large “ns/op” or “L1-miss-ns/op”
 might be rescaled to “ms/op” or “L1-miss-ms/op” for display.
 
-### Benchmark Name Configuration
+#### Benchmark Name Configuration
 
 In the current testing package, benchmark names correspond to Go identifiers:
 each benchmark must be written as a different Go function.
@@ -184,6 +184,49 @@ that slash-prefixed key=value pairs in the benchmark name are
 treated by benchmark data processors as per-benchmark 
 configuration values.
 
+### Unit metadata
+
+When a benchmark reports units outside the standard units implemented
+by the testing package, it can be useful for tools to understand
+additional metadata about those units.
+
+A unit metadata line has the form
+
+	Unit <unit> <key>=<value> <key>=<value> ...
+
+The fields are separated by runs of space characters (as defined by
+`unicode.IsSpace`), and space characters are not allowed within unit,
+key, or value.
+Keys must not contain `=`.
+
+It is an error to specify different values for any given unit and key,
+even on different unit metadata lines.
+That is, once unit metadata is specified, it can't be overridden.
+Specifying the same value for a key multiple times is not an error.
+
+Unit metadata applies to all following benchmark result lines, though
+it is unspecified whether it applies to earlier benchmark results
+lines.
+This allows for stream-oriented processing of benchmark results.
+
+Keys are not constrained, but the following keys have predefined
+meanings:
+
+- `better={higher,lower}` indicates whether higher or lower values of
+  this unit are better (indicate an improvement).
+  By default, ns/op, B/op, and allocs/op are `better=lower`, and MB/s
+  is `better=higher`.
+  Other units do not assume a default.
+
+- `assume={nothing,exact}` indicates what statistical assumption to
+  make when considering distributions of values.
+  `nothing` means to make no statistical assumptions (e.g., use
+  non-parametric methods) and `exact` means to assume measurements are
+  exact (repeated measurement does not increase confidence).
+  The default is `nothing`.
+  In the future we may also support `normal`, but that's almost never
+  the right assumption for benchmarks.
+
 ### Example
 
 The benchmark output given in the background section above