Why is the benchmark reporting 10-times higher values? #791

tomaskulich · 2015-02-02T11:35:57Z

I understand, it's beneficial (for the sake of accurateness) to run the measured function 10 times in a loop. But, why this is the value which is actually reported? Why not report value divided by 10? It is really confusing!

jakemac53 · 2015-04-13T19:01:49Z

+1, I think its worth going to 2.0.0 to fix this

sethladd · 2015-04-13T19:15:31Z

cc @johnmccutchan for insight

johnmccutchan · 2015-04-13T20:52:08Z

This is legacy that we should not change. The purpose of this harness is to replicate the exact same benchmark conditions that we use internally. All historical data has this same scaling in place.

If you have a benchmark that runs under this harness and you speed it up (or slow it down), you can see the relative change to your base line.

tl;dr- this isn't a "code timer" but a benchmark runner that is designed to match our internal benchmarking infrastructure.

jakemac53 · 2015-04-13T20:54:50Z

Internally we could stay on 1.0.4 though right? It seems wrong to force this behavior on all users of this package. Or maybe we could hide it behind an option?

johnmccutchan · 2015-04-14T14:56:09Z

@jakemac53 If the external version changes it makes it impossible for us to compare results to our internal numbers.

I'll discuss what we want to do long term with this package at the next compiler team meeting.

jakemac53 · 2015-04-14T15:02:00Z

Ok sounds good, my main concern is that this package is currently advertising itself as the officially endorsed package for writing dart benchmarks, but really it seems like its just for internal use if we can't ever make changes which would throw off our historical measurements. Instead it seems like users should lock themselves to a particular version and choose to upgrade when the benefits of the new features outweigh the cost of having to normalize their historical data.

skybrian · 2015-07-18T00:23:39Z

Seems like a flag would work. If the examples and internal benchmarks set the same flag, the results would be comparable, without affecting other benchmarks.

This merges two benchmark packages into one to make it easier to run benchmarks on Golem and locally. `api_benchmark` is not merged: it runs a web server and benchmarks performance in an actual browser. We should revisit those benchmarks to see if they make sense and see if it's a good idea to merge them into the new benchmark package in this PR. Changes: - Remove `protobuf_benchmarks`, `query_benchmark`. Move benchmarks in both to `benchmarks`. - Use `benchmark_harness` in all benchmarks. Previously only `protobuf_benchmarks` used `benchmark_harness`. - Because `benchmark_harness` reports 10x the actual number (https://github.com/dart-lang/benchmark_harness/issues/30), to be compatible with the old `query_benchmark` results (which are stored in Golem) and generally report accurate results, `BenchmarkBase` class overrides the relevant method(s) from `benchmark_harness.BenchmarkBase` to divide the result by 10 before reporting. Outputs before: RunTimeRaw(protobuf_decode): 973 us RunTimeRaw(protobuf_decode_json): 3914 us RunTimeRaw(protobuf_encode): 3135 us RunTimeRaw(protobuf_encode_json): 4606 us RunTimeRaw(protobuf_decode): 30 us Outputs after: protobuf_query_decode(RunTimeRaw): 973.4907766990291 us. protobuf_query_decode_json(RunTimeRaw): 3925.9745098039216 us. protobuf_query_encode(RunTimeRaw): 3115.623076923077 us. protobuf_query_encode_json(RunTimeRaw): 4568.952272727272 us. protobuf_query_set_nested_value(RunTimeRaw): 18.9181422625804 us. All benchmarks (query_benchmark and others) use this new class. - Directory structure: - `datasets/` has benchmark inputs (i.e. binary encoded protos) - `protos/` has benchmark protos - `lib/` has code common to all benchmarks (file operations, benchmark class) - `tool/compile_protos.sh` compiles all protos - `tool/run_protoc_plugin.sh` runs protoc plugin from `../protoc_plugin`. Used by `compile_protos.sh`. - `tool/compile_benchmarks.dart` compiles benchmark programs to native and JS, in parallel - `bin/` contains benchmarks - `protoc_version` is used by Golem to download the right version of protoc - Each benchmark now has its own executable. This is to allow benchmarking changes in isolation. For example, to avoid changing encoding benchmark results when decoding code was modified, because of global analyses or JIT. - `protobuf_benchmarks/` inputs no longer use [`benchmarks.proto`][1]. Instead the input files are now the `payload` fields of `BenchmarkDataset`s, and input file name specifies message type. Fixes #613 Golem cl 4855277 [1]: https://github.com/google/protobuf.dart/blob/15d9eed9f44dc3bab0738d33921d2608f3b158b5/protobuf_benchmarks/benchmarks.proto

osa1 mentioned this issue Jun 28, 2022

Merge protobuf_benchmarks and query_benchmark google/protobuf.dart#698

Merged

mosuem transferred this issue from dart-archive/benchmark_harness Oct 29, 2024

mosuem added the package:benchmark_harness label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the benchmark reporting 10-times higher values? #791

Why is the benchmark reporting 10-times higher values? #791

tomaskulich commented Feb 2, 2015

jakemac53 commented Apr 13, 2015

sethladd commented Apr 13, 2015

johnmccutchan commented Apr 13, 2015

jakemac53 commented Apr 13, 2015

johnmccutchan commented Apr 14, 2015

jakemac53 commented Apr 14, 2015

skybrian commented Jul 18, 2015

Why is the benchmark reporting 10-times higher values? #791

Why is the benchmark reporting 10-times higher values? #791

Comments

tomaskulich commented Feb 2, 2015

jakemac53 commented Apr 13, 2015

sethladd commented Apr 13, 2015

johnmccutchan commented Apr 13, 2015

jakemac53 commented Apr 13, 2015

johnmccutchan commented Apr 14, 2015

jakemac53 commented Apr 14, 2015

skybrian commented Jul 18, 2015