Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(minor): Add Benchmark parameters #274

Merged

Conversation

CrownedPhoenix
Copy link
Contributor

@CrownedPhoenix CrownedPhoenix commented Sep 16, 2024

Description

Sometimes it is convenient to run the same benchmark multiple times under slightly different scenarios. For example, benchmarking func foo(_ n: Int) for various parameterizations of n. This adds support for specifying the parameters of a benchmark but makes no functional changes beyond the name/identifier of a benchmark now including a parameter list when one was provided.

This unlocks the potential for the benchmark exporters (such as the Influx exporter) to leverage these parameters (by emitting appropriate tags/fields in the case of Influx) in their output.

Resolves #267

How Has This Been Tested?

Added a new test to verify the output of Benchmark.name.
Also ran the remaining tests to make sure no other behavior changed.

Minimal checklist:

  • I have performed a self-review of my own code
  • I have added DocC code-level documentation for any public interfaces exported by the package
  • I have added unit and/or integration tests that prove my fix is effective or that my feature works

@CrownedPhoenix CrownedPhoenix marked this pull request as ready for review September 16, 2024 18:07
@hassila
Copy link
Contributor

hassila commented Sep 16, 2024

Thanks! Will be a little bit of time before being able to review properly - mostly out this week - but one quick comment: do you think it would make sense to support ranges for int/doubles too?

@CrownedPhoenix
Copy link
Contributor Author

CrownedPhoenix commented Sep 16, 2024

do you think it would make sense to support ranges for int/doubles too?

You're thinking something like this?

parameters: ["foo":  0...4, "bar": 1.0...10.0]

I don't see why we couldn't support something like that.
In practice, I wouldn't expect the implementer to actually use the stored parameter values directly (like they do for scaledIterations). Rather, they would be generating the parameters first and then initializing the benchmark - such that they will already have access to the parameters directly via closure capture.

So likely they could get the behavior they want by just string encoding the range and using that string as the value.

For example:

for range in [0...1, 1...10, 10...100] {
    Benchmark(
        "testFoo",
        parameters: [
            BenchmarkParameter("range", String(describing: range) )
        ]
    ) {  /* do something with `range` */  }
}

In summary, I'm certainly not opposed - though I'm not sure about the value proposition yet?

Admittedly, my argument here begs the question why we bother supporting anything besides String at all.
Maybe BenchmarkParameter could just have a generic init over CustomStringConvertible.
The only downside to this is that there are probably some instances where knowing if the value is numeric vs string/complex is valuable. So maybe replacing the String init with a CustomStringConvertible parameter and maintaining the float and int initializers as well would be suitable.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 65.67%. Comparing base (75e8622) to head (5c050b5).

Files with missing lines Patch % Lines
Sources/Benchmark/Benchmark.swift 87.50% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #274      +/-   ##
==========================================
+ Coverage   65.32%   65.67%   +0.35%     
==========================================
  Files          33       34       +1     
  Lines        3610     3653      +43     
==========================================
+ Hits         2358     2399      +41     
- Misses       1252     1254       +2     
Files with missing lines Coverage Δ
Sources/Benchmark/BenchmarkParameter.swift 100.00% <100.00%> (ø)
Tests/BenchmarkTests/BenchmarkTests.swift 100.00% <100.00%> (ø)
Sources/Benchmark/Benchmark.swift 69.17% <87.50%> (+2.19%) ⬆️
Files with missing lines Coverage Δ
Sources/Benchmark/BenchmarkParameter.swift 100.00% <100.00%> (ø)
Tests/BenchmarkTests/BenchmarkTests.swift 100.00% <100.00%> (ø)
Sources/Benchmark/Benchmark.swift 69.17% <87.50%> (+2.19%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75e8622...5c050b5. Read the comment docs.

@hassila
Copy link
Contributor

hassila commented Sep 25, 2024

Hi, started to have a look at the PR now - I'd like to revert the discussion to the use site a bit, if this was the old approach to paramterization:

    let parameterization = (0...5).map { 1 << $0 } // 1, 2, 4, ...

    parameterization.forEach { count in
        Benchmark("ParameterizedWith\(count)") { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

Then with the additional API, I would expect to do just the minimal change:

    parameterization.forEach { count in
        Benchmark("ParameterizedTest", configuration: .init(parameters: [.init("count", count)])) { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

To basically record the parameters in use in the new parameters array - instead of encoding it into the name as in the previous current sample - for use in display and later in the export.

It's not a literal in this case, but a parameter that can vary - just as the currently supported parameterisation (just formalising the naming of the parameter that can vary for display/export purposes).

Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?

Also, writing the code above, perhaps it's really a dictionary that makes sense rather than an array (of string:string)?

E.g.

    parameterization.forEach { count in
        Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : count.description])) { benchmark in
            for _ in 0 ..< count {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

Just to avoid adding multiple entries for the same key. And if we go this way, the value is just a string description of the parameter that was in use - as this will not be used in practice by the code - the actual parameter value is captured when the benchmark is created, so it can be any type then?

Then a range works too;

  let range = 1...100
        Benchmark("ParameterizedTest", configuration: .init(parameters: ["count" : range.description])) { benchmark in
            for _ in range {
                blackHole(Int.random(in: benchmark.scaledIterations))
            }
        }
    }

@CrownedPhoenix
Copy link
Contributor Author

CrownedPhoenix commented Sep 25, 2024

I agree with everything you've outlined.
I think the conclusion of your point is that you prefer a parameters [String: String] dictionary. I think that's perfectly fine so I will make the appropriate changes.

Also, I'm not quite sure what the infinite/finite range is required for, would not expect to need that?

This is worth talking about as it is something that will be crucial for the Influx exporter.
In the world of Influx there are two different ways that a particular data point can be annotated.
A data point can be annotated with either a field or a tag.
Both of these are a (key: String, value: (String | Numeric) ).
I'll define a couple terms here for the sake of the story:

  • The domain of an annotation is its key. It's always a single value.
  • The range of an annotation is the set of all possible values that can occur for that annotation.

Now, the important distinction between tags and fields is in the way they are stored in the Influx database - tags are indexed and fields are not.

The consequence of this distinction is that:

In summary, whatever the design for parameterization - it's important for at least my use case to have information about whether the parameters are finite or infinite range. I'm totally open to suggestions on how we might designate this better.

@hassila
Copy link
Contributor

hassila commented Sep 26, 2024

Alright - I think we should also actually go back to your original proposal with a tags naming convention then if we go this way- if it is just a way to mark up benchmarks (any real parameterisation is captured anyway) for display / export - the actual benchmark code will typically never read or use those tags for anything at runtime, so calling it parameters is just confusing then - sorry about the detour.

I don't think we want to leak details of influx formats though - I would probably suggest that you in that case adopt a convention on tags (either key naming or similar) that is used by the influx exporter - or consider a post processing step external that marks up the rows using such a convention/configuration after exporting the benchmark data?

@CrownedPhoenix
Copy link
Contributor Author

I agree with not leaking Influx related details.

I'll return this to the tags nomenclature.

I'll have to keep thinking about how to do the disambiguation of tags/fields in a post processing step or otherwise.

It seems equally awkward to have a special implicit naming convention that only the influx exporter knows what to do with.

I'll think about it some more.

@hassila
Copy link
Contributor

hassila commented Sep 27, 2024

I would probably do a post processing step in your pipeline to handle that separately.

@CrownedPhoenix
Copy link
Contributor Author

CrownedPhoenix commented Sep 30, 2024

Alright, I made the changes we discussed to put the nomenclature back to tag (instead of parameters).

Could you elaborate a bit on what the post-processing step you are suggesting might look like?

  1. Are you suggesting I add code to the InfluxExporter in this repo that post-processes BenchmarkResults in order to output the proper Influx data (which would require implicit prefixes on the tags used in the Benchmark)?

  2. Or do you mean that I should post-process the output of the InfluxExporter and convert select tags to fields as appropriate?

Neither one of these is ideal. # 2 is what I believe you are suggesting and I think it's doable - just means some more scripting in my CI pipeline.

Was hoping we could get to a point where the InfluxExporter native to this repo more fully supports Influx outputs. But maybe there just really isn't an appropriate generalization that is still relevant to the other exporters.

What do you think about a mechanism (I'd make a separate MR for this) for injecting exporter-specific information for a Benchmark into the Benchmark initializer?

Something like:

Benchmark(
    /* params */,
    exportConfigs: [ // [ExportConfiguration] where `ExportConfiguration` is a protocol
        // This is where I could specify what `tag`s should be considered as `field`s during export
        InfluxExportConfiguration(...)
    ]
)

OR

Benchmark(
    /* params */,
    exportConfigs: [ 
        // This would better ensure no duplicate configs get passed.
        .influx: InfluxExportConfiguration(...)
    ]
)

This would allow the Benchmark implementer to specify some salient information about a benchmark to the exporter that may only be relevant to that may only be relevant to that particular exporter. This would help get around your concern (which I think is very valid) of leaking Influx specific information into the base Benchmark config.

@hassila
Copy link
Contributor

hassila commented Oct 1, 2024

I was thinking of option #2, yep - I assume it would be a fairly straightforward script step - but I think we could also consider a separate PR for export configurations as you suggest (second one seems to make more sense then), e.g.

Benchmark(
    /* params */,
    exportConfigurations: [ 
        .influx: InfluxExportConfiguration(...)
    ]
)

@CrownedPhoenix CrownedPhoenix force-pushed the feat(minor)/benchmark-parameters branch from e327a91 to 08940ee Compare October 1, 2024 14:55
@CrownedPhoenix
Copy link
Contributor Author

Awesome - I'll work on a followup PR after this one. I think this is ready for the final review.

There are some CI checks that have been failing that are not clear to me how to address - I'd appreciate any pointers you have on running these checks locally.

@hassila
Copy link
Contributor

hassila commented Oct 1, 2024

BenchmarkTool.swift needs to use the new baseName instead at line 271:

            if try shouldIncludeBenchmark(benchmark.baseName) {

(and then baseName needs to be public).

Otherwise a benchmark using tags will not properly match the regex matching.

E.g. I tried adding tags to the Basic benchmark in the Benchmarks directory thus:

    Benchmark("Basic",
              configuration: .init(metrics: [.wallClock, .throughput, .instructions], tags: ["a" : "b", "c" : "d"])) { _ in
    }

Then this broke:

swift package benchmark --target Basic --filter "Basic"

and required this:

swift package benchmark --target Basic --filter "Basic.*"

to match.

Changing the regex matching to baseName instead sorts that.

@hassila
Copy link
Contributor

hassila commented Oct 1, 2024

The SwiftLint should be run locally (you need to kill the .build directory first):

hassila@ice ~/package-benchmark (feat(minor)/benchmark-parameters)> swiftlint .
Linting Swift files at paths .
Linting 'Int+Extensions.swift' (1/24)
Linting 'Benchmark.swift' (2/24)
Linting 'Statistics.swift' (3/24)
Linting 'BenchmarkRunner+ReadWrite.swift' (4/24)
Linting 'ARCStatsProducer.swift' (5/24)
Linting 'ARCStats.swift' (6/24)
Linting 'BenchmarkThresholds.swift' (7/24)
Linting 'BenchmarkRunner.swift' (9/24)
Linting 'Benchmark+ConvenienceInitializers.swift' (8/24)
Linting 'BenchmarkResult.swift' (10/24)
Linting 'BenchmarkInternals.swift' (12/24)
Linting 'BenchmarkMetric+Defaults.swift' (13/24)
Linting 'BenchmarkThresholds+Defaults.swift' (11/24)
Linting 'BenchmarkExecutor+Extensions.swift' (14/24)
Linting 'Blackhole.swift' (15/24)
Linting 'MallocStats.swift' (16/24)
Linting 'MallocStatsProducer+jemalloc.swift' (17/24)
Linting 'MallocStats+jemalloc-support.swift' (18/24)
Linting 'BenchmarkClock.swift' (19/24)
Linting 'OperatingSystemStatsProducer+Darwin.swift' (20/24)
Linting 'OperatingSystemStats.swift' (21/24)
Linting 'OperatingSystemStatsProducer+Linux.swift' (22/24)
Linting 'BenchmarkExecutor.swift' (23/24)
Linting 'BenchmarkMetric.swift' (24/24)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:36: warning: Blanket Disable Command Violation: The disabled 'type_body_length' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:23: warning: Blanket Disable Command Violation: The disabled 'file_length,' rule should be re-enabled before the end of the file (blanket_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:475:1: warning: File Length Violation: File should contain 400 lines or less: currently contains 475 (file_length)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:20:55: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:24:60: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:28:63: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:32:68: warning: Prefer Self in Static References Violation: Use `Self` to refer to the surrounding type name (prefer_self_in_static_references)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:13:52: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:436:37: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
/Users/hassila/package-benchmark/Sources/Benchmark/Benchmark.swift:447:36: warning: Superfluous Disable Command Violation: 'file_length,' is not a valid SwiftLint rule; remove it from the disable command (superfluous_disable_command)
Linting 'P90AbsoluteThresholds.swift' (1/13)
Linting 'DateTime.swift' (2/13)
Linting 'Basic+SetupTeardown.swift' (3/13)
Linting 'BenchmarkRunner+Basic.swift' (4/13)
Linting 'Histogram.swift' (5/13)
Linting 'Package.swift' (6/13)
Linting 'BenchmarkTests.swift' (7/13)
Linting 'OperatingSystemAndMallocTests.swift' (8/13)
Linting 'BenchmarkMetricsTests.swift' (9/13)
Linting 'StatisticsTests.swift' (10/13)
Linting 'AdditionalTests.swift' (11/13)
Linting 'BenchmarkResultTests.swift' (12/13)
Linting 'BenchmarkRunnerTests.swift' (13/13)

@CrownedPhoenix CrownedPhoenix force-pushed the feat(minor)/benchmark-parameters branch from 08940ee to a7b3292 Compare October 1, 2024 20:56
Sometimes it is convenient to run the same benchmark multiple times under slightly different scenarios.
For example, benchmarking `func foo(_ n: Int)` for various parameterizations of `n`.
This adds support for specifying tags for a benchmark but makes no functional changes beyond the name/identifier of a benchmark now including a parameter list when one was provided.

This unlocks the potential for the benchmark exporters (such as the Influx exporter) to leverage these tags (by emitting appropriate tags/fields in the case of Influx) in their output.
@CrownedPhoenix CrownedPhoenix force-pushed the feat(minor)/benchmark-parameters branch from a7b3292 to 4b34c19 Compare October 1, 2024 20:58
Copy link
Contributor

@hassila hassila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! LGTM now - the remaining failing CI is due to other issues which we are investigating - something has broken API and code coverage checks recently.

@hassila hassila merged commit aedd5e6 into ordo-one:main Oct 2, 2024
10 of 14 checks passed
@CrownedPhoenix CrownedPhoenix deleted the feat(minor)/benchmark-parameters branch October 2, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Custom benchmark tags
2 participants