Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark tooling #691

Closed
wants to merge 1 commit into from
Closed

Add benchmark tooling #691

wants to merge 1 commit into from

Conversation

mrueg
Copy link
Contributor

@mrueg mrueg commented Feb 8, 2024

This adds a way to compare benchmarks, hopefully allowing for better comparisons when applying changes.

We use this in https://github.com/kubernetes/kube-state-metrics and it can be integrated with via Github Actions as well if desired.

@ahrtr
Copy link
Member

ahrtr commented Feb 9, 2024

Thanks for the PR.

it can be integrated with via Github Actions as well if desired.

Yes, it's desired!

@jmhbnz
Copy link
Member

jmhbnz commented Feb 11, 2024

/ok-to-test

@mrueg mrueg force-pushed the add-benchmark branch 2 times, most recently from 4b5aed1 to 7f4a38c Compare February 11, 2024 23:10
@mrueg mrueg changed the title WIP: Add benchmark tooling Add benchmark tooling Feb 11, 2024
@ahrtr
Copy link
Member

ahrtr commented Feb 16, 2024

Please take a look at the workflow failure.

Makefile Outdated
@@ -1,5 +1,6 @@
BRANCH=`git rev-parse --abbrev-ref HEAD`
COMMIT=`git rev-parse --short HEAD`
LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the current workflow failure is that it's trying to fetch the release-1.4 branch. However, because the main branch has the next release, this should return 1.3 rather than 1.4.

Suggested change
LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go)
LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go | awk -F. '{print $1 "." $2-1})'

@ivanvc
Copy link
Member

ivanvc commented Apr 23, 2024

Hey @ahrtr, I think there's value in enabling this with my suggestion due to the recent findings on #720. How would you suggest to proceed with this PR?

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2024

I think we should compare the performance between a PR and the main branch.

@mrueg
Copy link
Contributor Author

mrueg commented Apr 25, 2024

if PR and main is fine, i can remove the comparison to the previous release.

@ahrtr
Copy link
Member

ahrtr commented Apr 30, 2024

if PR and main is fine, i can remove the comparison to the previous release.

Yes, please.

This adds a way to compare benchmarks that we use in kube-state-metrics
already, hopefully allowing for better comparisons when applying
changes.

Signed-off-by: Manuel Rüger <[email protected]>
@ivanvc
Copy link
Member

ivanvc commented May 3, 2024

Hi @mrueg, it looks like it's failing because the benchmark job is timing out. We could either just increase the timeout or also split the benchmark into two jobs. I think the latter option would be better because I feel it would optimize for runtime and generate the output faster. What do you think? I can do a PoC or continue the implementation if you don't have the capacity.

Edit: added bold for emphasis on what I initially meant 😅

@mrueg
Copy link
Contributor Author

mrueg commented May 3, 2024

@ivanvc yes, please feel free to take over. I'm currently lacking time to work on this.

@ivanvc
Copy link
Member

ivanvc commented May 7, 2024

I have a branch with the functional code for this. However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence. Or at least two to give a prediction with around 50%. Each run takes about 20 minutes. IMHO, it's way too long to wait one hour for the result. @ahrtr, thoughts?

@ahrtr
Copy link
Member

ahrtr commented May 7, 2024

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

@mrueg
Copy link
Contributor Author

mrueg commented May 7, 2024

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

What would it compare against? This would be running against a PR

@ahrtr
Copy link
Member

ahrtr commented May 7, 2024

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

What would it compare against? This would be running against a PR

Right, it's unrealistic if we need to compare PR vs main.

  • If we run the workflow check on each PR, then we can try larger runner ubuntu-latest-8-cores.
  • Not each PR needs to run performance comparison, such as just document changes or typos. We can also add conditions to control the workflow execution, i.e manually add label performance/compare to trigger it?

Also it might make more sense to compare performance using the bbolt bench tool. I think we can add a separate workflow check for that.

@ivanvc
Copy link
Member

ivanvc commented May 8, 2024

Here's some discussion and progress regarding these last comments:

I currently managed to do three runs using ubuntu-latest-8-cores runners and tunning the running parameters with {ksize,vsize}=512, which runs under 30 minutes. Using a confidence=0.75, benchstat can generate a valid output. I formatted the result as a Markdown table and published it as a GitHub workflow job summary (we could also send a comment to the PR to make this more visible). See the result here: https://github.com/etcd-io/bbolt/actions/runs/8995575434.

However, I wonder if my original idea of splitting the benchmarks into two runners makes sense because if we have a job scheduled in a degraded host, it may impact the benchmark's result. Doing both runs on the same runner would mean that instead of 30 minutes, the run time would increase to one hour.

Also it might make more sense to compare performance using the bbolt bench tool.

That was my intention with #739, which makes sense. But we'll need to define a set of parameters to run the benchmarks. I'm not sure who would be the best to define them (I'm guessing you @ahrtr 😅), but I suggest keeping that conversation in that issue.

Another idea would be to run these benchmarks as Prow jobs in the Kubernetes infra. In that case, I understand that the max CPU we can request is 7 CPUs. So, I'm unsure if they will be any better than GitHub's ubuntu-latest-8-cores.

@ahrtr
Copy link
Member

ahrtr commented May 8, 2024

See the result here: https://github.com/etcd-io/bbolt/actions/runs/8995575434.

Look great, thank.

which runs under 30 minutes.

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Doing both runs on the same runner

I think we should follow this approach to ensure both tests have exact the same environment.

@ivanvc
Copy link
Member

ivanvc commented May 9, 2024

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

@fuweid
Copy link
Member

fuweid commented May 13, 2024

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

+1 for running it in sequentially.
Multiple fdatasync syscalls would impact each other. Sequence makes sense here.

In that case, I understand that the max CPU we can request is 7 CPUs. So, I'm unsure if they will be any better than GitHub's ubuntu-latest-8-cores.

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

@ivanvc
Copy link
Member

ivanvc commented May 13, 2024

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

That's good to know. I'll try it out with them.

@ivanvc
Copy link
Member

ivanvc commented May 13, 2024

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

That's good to know. I'll try it out with them.

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

I could try tuning the running parameters to make it finish faster.

@fuweid
Copy link
Member

fuweid commented May 14, 2024

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

Hmm, I don't have permission to check this in etcd-io org.

Org → Settings → Actions → Runner groups → Default Large Runners

  • Select repositories

Maybe @ahrtr can help check it 😆

@jmhbnz
Copy link
Member

jmhbnz commented May 14, 2024

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

Hmm, I don't have permission to check this in etcd-io org.

Org → Settings → Actions → Runner groups → Default Large Runners

Hey Team - I believe this will require raising issue to kubernetes as etcd-io org is now managed under Kubernetes GitHub Enterprise account, for context please read:kubernetes/org#4590

@fuweid
Copy link
Member

fuweid commented May 14, 2024

@jmhbnz thanks for quick update. Last time, I remember that we have 3 large runners: 4 cores, 8 cores and 16 cores. Read that issue mentioned, the runners had been shutdown during migration. So I believe it has been deleted. If we still need 16 cores, we have to file cncf ticket to re-able it.

@ahrtr
Copy link
Member

ahrtr commented May 14, 2024

We only have ubuntu-latest-8-cores enabled for bbolt for now.

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

Have you confirmed that (run them sequentially)? It takes about 50m for the existing tests/coverage test, so one hour might be accepted.

@ahrtr
Copy link
Member

ahrtr commented May 14, 2024

Have you confirmed that (run them sequentially)? It takes about 50m for the existing tests/coverage test, so one hour might be accepted.

But I have no any objection if anyone wants to request a 16 core runner :)

@ivanvc
Copy link
Member

ivanvc commented May 15, 2024

It takes about one hour to run the benchmarks sequentially: https://github.com/etcd-io/bbolt/actions/runs/9087376452

If this is acceptable, we can roll it out with 8-core runners. However, I still wonder if the Prow infra would better fit this.

@ahrtr
Copy link
Member

ahrtr commented May 15, 2024

If this is acceptable, we can roll it out with 8-core runners. However, I still wonder if the Prow infra would better fit this.

Thanks. It's accepted to me. We can add it into github workflow firstly, and consider migrating to Prow later.

@ivanvc
Copy link
Member

ivanvc commented May 15, 2024

I'll clean up the code and mark the other PR ready for review.

@ivanvc ivanvc mentioned this pull request May 17, 2024
@ahrtr
Copy link
Member

ahrtr commented Jun 28, 2024

Thanks @mrueg for initiating the change, and the final implementation was completed by @ivanvc in #750

@ahrtr ahrtr closed this Jun 28, 2024
@mrueg
Copy link
Contributor Author

mrueg commented Jun 28, 2024

Thanks @ivanvc for getting the change in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants