Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the benchmark reporting 10-times higher values? #791

Open
tomaskulich opened this issue Feb 2, 2015 · 7 comments
Open

Why is the benchmark reporting 10-times higher values? #791

tomaskulich opened this issue Feb 2, 2015 · 7 comments

Comments

@tomaskulich
Copy link

I understand, it's beneficial (for the sake of accurateness) to run the measured function 10 times in a loop. But, why this is the value which is actually reported? Why not report value divided by 10? It is really confusing!

@jakemac53
Copy link
Contributor

+1, I think its worth going to 2.0.0 to fix this

@sethladd
Copy link
Contributor

cc @johnmccutchan for insight

@johnmccutchan
Copy link
Contributor

This is legacy that we should not change. The purpose of this harness is to replicate the exact same benchmark conditions that we use internally. All historical data has this same scaling in place.

If you have a benchmark that runs under this harness and you speed it up (or slow it down), you can see the relative change to your base line.

tl;dr- this isn't a "code timer" but a benchmark runner that is designed to match our internal benchmarking infrastructure.

@jakemac53
Copy link
Contributor

Internally we could stay on 1.0.4 though right? It seems wrong to force this behavior on all users of this package. Or maybe we could hide it behind an option?

@johnmccutchan
Copy link
Contributor

@jakemac53 If the external version changes it makes it impossible for us to compare results to our internal numbers.

I'll discuss what we want to do long term with this package at the next compiler team meeting.

@jakemac53
Copy link
Contributor

Ok sounds good, my main concern is that this package is currently advertising itself as the officially endorsed package for writing dart benchmarks, but really it seems like its just for internal use if we can't ever make changes which would throw off our historical measurements. Instead it seems like users should lock themselves to a particular version and choose to upgrade when the benefits of the new features outweigh the cost of having to normalize their historical data.

@skybrian
Copy link

Seems like a flag would work. If the examples and internal benchmarks set the same flag, the results would be comparable, without affecting other benchmarks.

osa1 referenced this issue in google/protobuf.dart Jul 7, 2022
This merges two benchmark packages into one to make it easier to run benchmarks
on Golem and locally.

`api_benchmark` is not merged: it runs a web server and benchmarks performance
in an actual browser. We should revisit those benchmarks to see if they make
sense and see if it's a good idea to merge them into the new benchmark package
in this PR.

Changes:

- Remove `protobuf_benchmarks`, `query_benchmark`. Move benchmarks in both to
  `benchmarks`.

- Use `benchmark_harness` in all benchmarks. Previously only
  `protobuf_benchmarks` used `benchmark_harness`.

- Because `benchmark_harness` reports 10x the actual number
  (https://github.com/dart-lang/benchmark_harness/issues/30), to be compatible
  with the old `query_benchmark` results (which are stored in Golem) and
  generally report accurate results, `BenchmarkBase` class overrides the
  relevant method(s) from `benchmark_harness.BenchmarkBase` to divide the
  result by 10 before reporting.

  Outputs before:

      RunTimeRaw(protobuf_decode): 973 us
      RunTimeRaw(protobuf_decode_json): 3914 us
      RunTimeRaw(protobuf_encode): 3135 us
      RunTimeRaw(protobuf_encode_json): 4606 us
      RunTimeRaw(protobuf_decode): 30 us

  Outputs after:

      protobuf_query_decode(RunTimeRaw): 973.4907766990291 us.
      protobuf_query_decode_json(RunTimeRaw): 3925.9745098039216 us.
      protobuf_query_encode(RunTimeRaw): 3115.623076923077 us.
      protobuf_query_encode_json(RunTimeRaw): 4568.952272727272 us.
      protobuf_query_set_nested_value(RunTimeRaw): 18.9181422625804 us.

  All benchmarks (query_benchmark and others) use this new class.

- Directory structure:

  - `datasets/` has benchmark inputs (i.e. binary encoded protos)

  - `protos/` has benchmark protos

  - `lib/` has code common to all benchmarks (file operations, benchmark class)

  - `tool/compile_protos.sh` compiles all protos

  - `tool/run_protoc_plugin.sh` runs protoc plugin from `../protoc_plugin`.
    Used by `compile_protos.sh`.

  - `tool/compile_benchmarks.dart` compiles benchmark programs to native and
    JS, in parallel

  - `bin/` contains benchmarks

  - `protoc_version` is used by Golem to download the right version of protoc

- Each benchmark now has its own executable. This is to allow benchmarking
  changes in isolation. For example, to avoid changing encoding benchmark
  results when decoding code was modified, because of global analyses or JIT.

- `protobuf_benchmarks/` inputs no longer use [`benchmarks.proto`][1]. Instead
  the input files are now the `payload` fields of `BenchmarkDataset`s, and
  input file name specifies message type.

Fixes #613

Golem cl 4855277

[1]: https://github.com/google/protobuf.dart/blob/15d9eed9f44dc3bab0738d33921d2608f3b158b5/protobuf_benchmarks/benchmarks.proto
@mosuem mosuem transferred this issue from dart-archive/benchmark_harness Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants