Skip to content
This repository has been archived by the owner on Jun 19, 2023. It is now read-only.

add BenchmarkARCCacheConcurrentOps #70

Merged
merged 1 commit into from
May 4, 2021
Merged

Conversation

mvdan
Copy link
Contributor

@mvdan mvdan commented Apr 28, 2021

No description provided.

@welcome

This comment has been minimized.

@mvdan mvdan marked this pull request as draft April 28, 2021 12:54
@mvdan
Copy link
Contributor Author

mvdan commented Apr 28, 2021

Sample output:

$ go test -run=- -bench=ARCCacheConcurrentOps -cpu=1,64,1024 -benchtime=1s
goos: linux
goarch: amd64
pkg: github.com/ipfs/go-ipfs-blockstore
cpu: Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz
BenchmarkARCCacheConcurrentOps/PutDelete              	  972183	      1082 ns/op	     309 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/PutDelete-64           	  944838	      1260 ns/op	     309 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/PutDelete-1024         	  858927	      1386 ns/op	     315 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete              	 2808345	       423.0 ns/op	     113 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete-64           	 1969640	       595.9 ns/op	     113 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetDelete-1024         	 1826703	       637.6 ns/op	     117 B/op	       3 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut                 	 1171681	      1025 ns/op	     313 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut-64              	 1082220	      1115 ns/op	     313 B/op	       7 allocs/op
BenchmarkARCCacheConcurrentOps/GetPut-1024            	  962059	      1246 ns/op	     319 B/op	       7 allocs/op

Note that I capped my CPU frequency at 70%, to prevent turbo boost and throttling from messing with the numbers.

Main changes from #65:

  • One single benchmark, which helps reuse code.
  • One sub-benchmark for each pair of racy operations.
  • All sub-benchmarks get a new ARC cache, pre-filled with half of the 4k dummy blocks.
  • All concurrency is within the number of goroutines spawned by the parallel benchmark, controlled by -cpu.
  • Each of the benchmark goroutines will randomly perform an operation (e.g. Put or Delete) on a random dummy block.
  • No artificial delays on the store. That's something done in fix(arc): Per-CID locking. Map CID to lock. #66, but it's not clear to me why we need them.

The number of blocks is hard-coded right now, but it could be turned into a flag if we want. The nice thing about reusing -cpu for concurrency is that it integrates very well with how one runs benchmarks and sees their results, and we don't have to worry about spawning extra goroutines of our own.

I run the benchmarks with 4k blocks, and 1, 64, and 1024 concurrent goroutines. You can see that the numbers are fairly similar, while the time/op slowly increases with the number of goroutines. The numbers are also very stable; they don't change much at all running for longer with -benchtime=3s, and six consecutive runs with -count=6 shows little variance with benchstat:

name                                  time/op
ARCCacheConcurrentOps/PutDelete       1.09µs ± 1%
ARCCacheConcurrentOps/PutDelete-64    1.26µs ± 1%
ARCCacheConcurrentOps/PutDelete-1024  1.39µs ± 0%
ARCCacheConcurrentOps/GetDelete        420ns ± 1%
ARCCacheConcurrentOps/GetDelete-64     602ns ± 1%
ARCCacheConcurrentOps/GetDelete-1024   639ns ± 1%
ARCCacheConcurrentOps/GetPut          1.04µs ± 1%
ARCCacheConcurrentOps/GetPut-64       1.12µs ± 0%
ARCCacheConcurrentOps/GetPut-1024     1.23µs ± 0%

@frrist @warpfork @Stebalien @iand I'd love your thoughts. If the approach sounds good, I can rebase the other PRs atop this one and post before/after benchmark numbers for each one of them on the original issue thread.

@mvdan
Copy link
Contributor Author

mvdan commented Apr 28, 2021

Also, there's no commit message for now, but I'll write one when I move this out of draft state. The code is also pretty well documented.

@iand
Copy link

iand commented Apr 28, 2021

The timings per op seem sensitive to the number of blocks which I don't have a good explanation for. For example I changed to use numBlocks = 4 << 14 and ns/op doubled or higher across the board while bytes and allocs per op remained the same.

Maybe the default 4k setting is small enough to cause contention on the blocks within the 1s timeframe, but 64k is not. I don't know anything about the locking strategy already in place for the blockstore so I am just guessing.

@mvdan
Copy link
Contributor Author

mvdan commented Apr 28, 2021

Fewer blocks will cause more collisions. Some pairs of ops like Get+Delete also gradually remove all blocks from the cache, so the smaller the number of blocks, the quicker that happens.

I'm also not particularly familiar with the internal architecture or what a good default for the number of blocks should be. The original PR used one million, and that definitely feels too large - even with 1k goroutines and tens of thousands of iterations, the chances for concurrent collisions are practically none. So I felt like we needed a number of blocks that was closer to the number of concurrent goroutines.

@mvdan mvdan mentioned this pull request Apr 29, 2021
@mvdan mvdan marked this pull request as ready for review May 3, 2021 20:35
@mvdan mvdan changed the title WIP: add BenchmarkARCCacheConcurrentOps add BenchmarkARCCacheConcurrentOps May 3, 2021
@frrist frrist mentioned this pull request May 4, 2021
2 tasks
@Stebalien Stebalien merged commit 10b7bf0 into master May 4, 2021
@Stebalien Stebalien deleted the mvdan/arc-cache-bench branch May 4, 2021 22:43
@aschmahmann aschmahmann mentioned this pull request Dec 13, 2021
59 tasks
Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this pull request Mar 23, 2023
…he-bench

add BenchmarkARCCacheConcurrentOps

This commit was moved from ipfs/go-ipfs-blockstore@10b7bf0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants