Add benchmark tooling #691

mrueg · 2024-02-08T22:56:21Z

This adds a way to compare benchmarks, hopefully allowing for better comparisons when applying changes.

We use this in https://github.com/kubernetes/kube-state-metrics and it can be integrated with via Github Actions as well if desired.

ahrtr · 2024-02-09T09:56:51Z

Thanks for the PR.

it can be integrated with via Github Actions as well if desired.

Yes, it's desired!

jmhbnz · 2024-02-11T18:43:23Z

/ok-to-test

ahrtr · 2024-02-16T08:51:04Z

Please take a look at the workflow failure.

ivanvc · 2024-04-12T22:09:04Z

Makefile

@@ -1,5 +1,6 @@
 BRANCH=`git rev-parse --abbrev-ref HEAD`
 COMMIT=`git rev-parse --short HEAD`
+LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go)


The issue with the current workflow failure is that it's trying to fetch the release-1.4 branch. However, because the main branch has the next release, this should return 1.3 rather than 1.4.

Suggested change

LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go)

LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go | awk -F. '{print $1 "." $2-1})'

ivanvc · 2024-04-23T18:18:47Z

Hey @ahrtr, I think there's value in enabling this with my suggestion due to the recent findings on #720. How would you suggest to proceed with this PR?

ahrtr · 2024-04-23T18:25:55Z

I think we should compare the performance between a PR and the main branch.

mrueg · 2024-04-25T11:22:54Z

if PR and main is fine, i can remove the comparison to the previous release.

ahrtr · 2024-04-30T15:09:07Z

if PR and main is fine, i can remove the comparison to the previous release.

Yes, please.

This adds a way to compare benchmarks that we use in kube-state-metrics already, hopefully allowing for better comparisons when applying changes. Signed-off-by: Manuel Rüger <[email protected]>

ivanvc · 2024-05-03T02:56:26Z

Hi @mrueg, it looks like it's failing because the benchmark job is timing out. We could either just increase the timeout or also split the benchmark into two jobs. I think the latter option would be better because I feel it would optimize for runtime and generate the output faster. What do you think? I can do a PoC or continue the implementation if you don't have the capacity.

Edit: added bold for emphasis on what I initially meant 😅

mrueg · 2024-05-03T16:43:34Z

@ivanvc yes, please feel free to take over. I'm currently lacking time to work on this.

ivanvc · 2024-05-07T04:35:40Z

I have a branch with the functional code for this. However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence. Or at least two to give a prediction with around 50%. Each run takes about 20 minutes. IMHO, it's way too long to wait one hour for the result. @ahrtr, thoughts?

ahrtr · 2024-05-07T09:34:38Z

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

mrueg · 2024-05-07T09:39:57Z

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

What would it compare against? This would be running against a PR

ahrtr · 2024-05-07T10:08:57Z

However, benchstat needs at least six benchmark runs to generate a prediction with 95% confidence.

In that case, can we add it as a nightly check?

What would it compare against? This would be running against a PR

Right, it's unrealistic if we need to compare PR vs main.

If we run the workflow check on each PR, then we can try larger runner ubuntu-latest-8-cores.
Not each PR needs to run performance comparison, such as just document changes or typos. We can also add conditions to control the workflow execution, i.e manually add label performance/compare to trigger it?

Also it might make more sense to compare performance using the bbolt bench tool. I think we can add a separate workflow check for that.

ivanvc · 2024-05-08T11:54:43Z

Here's some discussion and progress regarding these last comments:

I currently managed to do three runs using ubuntu-latest-8-cores runners and tunning the running parameters with {ksize,vsize}=512, which runs under 30 minutes. Using a confidence=0.75, benchstat can generate a valid output. I formatted the result as a Markdown table and published it as a GitHub workflow job summary (we could also send a comment to the PR to make this more visible). See the result here: https://github.com/etcd-io/bbolt/actions/runs/8995575434.

However, I wonder if my original idea of splitting the benchmarks into two runners makes sense because if we have a job scheduled in a degraded host, it may impact the benchmark's result. Doing both runs on the same runner would mean that instead of 30 minutes, the run time would increase to one hour.

Also it might make more sense to compare performance using the bbolt bench tool.

That was my intention with #739, which makes sense. But we'll need to define a set of parameters to run the benchmarks. I'm not sure who would be the best to define them (I'm guessing you @ahrtr 😅), but I suggest keeping that conversation in that issue.

Another idea would be to run these benchmarks as Prow jobs in the Kubernetes infra. In that case, I understand that the max CPU we can request is 7 CPUs. So, I'm unsure if they will be any better than GitHub's ubuntu-latest-8-cores.

ahrtr · 2024-05-08T16:14:42Z

See the result here: https://github.com/etcd-io/bbolt/actions/runs/8995575434.

Look great, thank.

which runs under 30 minutes.

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Doing both runs on the same runner

I think we should follow this approach to ensure both tests have exact the same environment.

ivanvc · 2024-05-09T11:43:31Z

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

fuweid · 2024-05-13T11:37:24Z

It seems that you run the test in parallel. If you run them sequentially, it will be longer. #750 (comment)

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

+1 for running it in sequentially.
Multiple fdatasync syscalls would impact each other. Sequence makes sense here.

In that case, I understand that the max CPU we can request is 7 CPUs. So, I'm unsure if they will be any better than GitHub's ubuntu-latest-8-cores.

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

ivanvc · 2024-05-13T18:11:36Z

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

That's good to know. I'll try it out with them.

ivanvc · 2024-05-13T23:29:52Z

I remembered that cncf provides 16 cores for us. You can try it with ubuntu-latest-16-cores.

That's good to know. I'll try it out with them.

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

I could try tuning the running parameters to make it finish faster.

fuweid · 2024-05-14T05:12:57Z

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

Hmm, I don't have permission to check this in etcd-io org.

Org → Settings → Actions → Runner groups → Default Large Runners

Select repositories

Maybe @ahrtr can help check it 😆

jmhbnz · 2024-05-14T05:16:27Z

Uh, it looks like we don't have 16 core runners. The job is stalled @ https://github.com/etcd-io/bbolt/actions/runs/9067796547/job/24913688424.

Hmm, I don't have permission to check this in etcd-io org.

Org → Settings → Actions → Runner groups → Default Large Runners

Hey Team - I believe this will require raising issue to kubernetes as etcd-io org is now managed under Kubernetes GitHub Enterprise account, for context please read:kubernetes/org#4590

fuweid · 2024-05-14T06:01:54Z

@jmhbnz thanks for quick update. Last time, I remember that we have 3 large runners: 4 cores, 8 cores and 16 cores. Read that issue mentioned, the runners had been shutdown during migration. So I believe it has been deleted. If we still need 16 cores, we have to file cncf ticket to re-able it.

ahrtr · 2024-05-14T07:06:28Z

We only have ubuntu-latest-8-cores enabled for bbolt for now.

Yes, if we run them sequentially, it will take about one hour to complete the job. Is that acceptable?

Have you confirmed that (run them sequentially)? It takes about 50m for the existing tests/coverage test, so one hour might be accepted.

ahrtr · 2024-05-14T07:12:43Z

Have you confirmed that (run them sequentially)? It takes about 50m for the existing tests/coverage test, so one hour might be accepted.

But I have no any objection if anyone wants to request a 16 core runner :)

ivanvc · 2024-05-15T00:04:54Z

It takes about one hour to run the benchmarks sequentially: https://github.com/etcd-io/bbolt/actions/runs/9087376452

If this is acceptable, we can roll it out with 8-core runners. However, I still wonder if the Prow infra would better fit this.

ahrtr · 2024-05-15T09:12:34Z

If this is acceptable, we can roll it out with 8-core runners. However, I still wonder if the Prow infra would better fit this.

Thanks. It's accepted to me. We can add it into github workflow firstly, and consider migrating to Prow later.

ivanvc · 2024-05-15T17:54:26Z

I'll clean up the code and mark the other PR ready for review.

ahrtr · 2024-06-28T09:06:16Z

Thanks @mrueg for initiating the change, and the final implementation was completed by @ivanvc in #750

mrueg · 2024-06-28T11:06:19Z

Thanks @ivanvc for getting the change in!

k8s-ci-robot added the do-not-merge/work-in-progress label Feb 8, 2024

mrueg mentioned this pull request Feb 8, 2024

Improve memory alignment #673

Open

mrueg force-pushed the add-benchmark branch from c984adf to b6560bd Compare February 8, 2024 22:59

mrueg force-pushed the add-benchmark branch from b6560bd to 85fb5fd Compare February 9, 2024 10:31

k8s-ci-robot added the ok-to-test label Feb 11, 2024

mrueg force-pushed the add-benchmark branch 2 times, most recently from 4b5aed1 to 7f4a38c Compare February 11, 2024 23:10

mrueg changed the title ~~WIP: Add benchmark tooling~~ Add benchmark tooling Feb 11, 2024

k8s-ci-robot removed the do-not-merge/work-in-progress label Feb 11, 2024

ivanvc reviewed Apr 12, 2024

View reviewed changes

ivanvc mentioned this pull request Apr 24, 2024

Introduce benchmarking for PRs #739

Open

Add benchmark tooling

ae7e4f9

This adds a way to compare benchmarks that we use in kube-state-metrics already, hopefully allowing for better comparisons when applying changes. Signed-off-by: Manuel Rüger <[email protected]>

mrueg force-pushed the add-benchmark branch from 7f4a38c to ae7e4f9 Compare April 30, 2024 15:42

ivanvc mentioned this pull request May 17, 2024

Add PR benchmarking #750

Merged

ahrtr closed this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark tooling #691

Add benchmark tooling #691

mrueg commented Feb 8, 2024

ahrtr commented Feb 9, 2024 •

edited

Loading

jmhbnz commented Feb 11, 2024

ahrtr commented Feb 16, 2024

ivanvc Apr 12, 2024

ivanvc commented Apr 23, 2024

ahrtr commented Apr 23, 2024

mrueg commented Apr 25, 2024

ahrtr commented Apr 30, 2024

ivanvc commented May 3, 2024 •

edited

Loading

mrueg commented May 3, 2024

ivanvc commented May 7, 2024

ahrtr commented May 7, 2024

mrueg commented May 7, 2024

ahrtr commented May 7, 2024

ivanvc commented May 8, 2024

ahrtr commented May 8, 2024

ivanvc commented May 9, 2024

fuweid commented May 13, 2024

ivanvc commented May 13, 2024

ivanvc commented May 13, 2024

fuweid commented May 14, 2024

jmhbnz commented May 14, 2024

fuweid commented May 14, 2024

ahrtr commented May 14, 2024

ahrtr commented May 14, 2024

ivanvc commented May 15, 2024

ahrtr commented May 15, 2024

ivanvc commented May 15, 2024

ahrtr commented Jun 28, 2024

mrueg commented Jun 28, 2024

	LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go)
	LATEST_RELEASE_BRANCH := release-$(shell grep -ohE "[0-9]+.[0-9]+" version/version.go \| awk -F. '{print $1 "." $2-1})'

Add benchmark tooling #691

Add benchmark tooling #691

Conversation

mrueg commented Feb 8, 2024

ahrtr commented Feb 9, 2024 • edited Loading

jmhbnz commented Feb 11, 2024

ahrtr commented Feb 16, 2024

ivanvc Apr 12, 2024

Choose a reason for hiding this comment

ivanvc commented Apr 23, 2024

ahrtr commented Apr 23, 2024

mrueg commented Apr 25, 2024

ahrtr commented Apr 30, 2024

ivanvc commented May 3, 2024 • edited Loading

mrueg commented May 3, 2024

ivanvc commented May 7, 2024

ahrtr commented May 7, 2024

mrueg commented May 7, 2024

ahrtr commented May 7, 2024

ivanvc commented May 8, 2024

ahrtr commented May 8, 2024

ivanvc commented May 9, 2024

fuweid commented May 13, 2024

ivanvc commented May 13, 2024

ivanvc commented May 13, 2024

fuweid commented May 14, 2024

jmhbnz commented May 14, 2024

fuweid commented May 14, 2024

ahrtr commented May 14, 2024

ahrtr commented May 14, 2024

ivanvc commented May 15, 2024

ahrtr commented May 15, 2024

ivanvc commented May 15, 2024

ahrtr commented Jun 28, 2024

mrueg commented Jun 28, 2024

ahrtr commented Feb 9, 2024 •

edited

Loading

ivanvc commented May 3, 2024 •

edited

Loading