[WIP] NetworkDB performance improvements #2046

fcrisciani · 2018-01-05T22:41:04Z

CPU profile showed how the mRandomNodes was taking ~30% of the CPU time
of the gossip cycle.

Changing the data structure from a []string to a map allowed to improve
the performance of all the functions that were using it.

Following the comparison of the benchmarks before and after the change

Benchmark on:

goos: darwin
goarch: amd64

AddNodeNetwork (90% faster):

benchmark                     old ns/op     new ns/op     delta
BenchmarkAddNetworkNode-4     1859          181           -90.26%

benchmark                     old allocs     new allocs     delta
BenchmarkAddNetworkNode-4     1              1              +0.00%

benchmark                     old bytes     new bytes     delta
BenchmarkAddNetworkNode-4     15            15            +0.00%

DelNodeNetwork (8% faster):

benchmark                        old ns/op     new ns/op     delta                                                                                                                                                                                                              
BenchmarkDeleteNetworkNode-4     11.0          10.1          -8.18%                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                
benchmark                        old allocs     new allocs     delta                                                                                                                                                                                                            
BenchmarkDeleteNetworkNode-4     0              0              +0.00%                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                
benchmark                        old bytes     new bytes     delta                                                                                                                                                                                                              
BenchmarkDeleteNetworkNode-4     0             0             +0.00%

RandomNode (90% faster and 93% less allocations):

benchmark                  old ns/op     new ns/op     delta
BenchmarkRandomNodes-4     1830          172           -90.60%

benchmark                  old allocs     new allocs     delta
BenchmarkRandomNodes-4     16             1              -93.75%

benchmark                  old bytes     new bytes     delta
BenchmarkRandomNodes-4     535           48            -91.03%

Full profile:

Detail:

Signed-off-by: Flavio Crisciani [email protected]

codecov-io · 2018-01-05T23:05:37Z

Codecov Report

❗ No coverage uploaded for pull request base (master@83862f4). Click here to learn what that means.
The diff coverage is 87.17%.

@@            Coverage Diff            @@
##             master    #2046   +/-   ##
=========================================
  Coverage          ?   40.05%           
=========================================
  Files             ?      138           
  Lines             ?    22108           
  Branches          ?        0           
=========================================
  Hits              ?     8856           
  Misses            ?    11954           
  Partials          ?     1298

Impacted Files	Coverage Δ
networkdb/delegate.go	`73.97% <100%> (ø)`
networkdb/networkdb.go	`65.61% <85.71%> (ø)`
networkdb/cluster.go	`63.75% <85.71%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83862f4...6ba12b1. Read the comment docs.

CPU profile showed how the mRandomNodes was taking ~30% of the CPU time of the gossip cycle. Changing the data structure from a []string to a map allowed to improve the performance of all the functions that were using it. Following the comparison of the benchmarks before and after the change AddNodeNetwork: benchmark old ns/op new ns/op delta BenchmarkAddNetworkNode-4 1859 181 -90.26% benchmark old allocs new allocs delta BenchmarkAddNetworkNode-4 1 1 +0.00% benchmark old bytes new bytes delta BenchmarkAddNetworkNode-4 15 15 +0.00% DelNodeNetwork: benchmark old ns/op new ns/op delta BenchmarkDeleteNetworkNode-4 71.0 75.8 +6.76% benchmark old allocs new allocs delta BenchmarkDeleteNetworkNode-4 0 0 +0.00% benchmark old bytes new bytes delta BenchmarkDeleteNetworkNode-4 3 7 +133.33% RandomNode: benchmark old ns/op new ns/op delta BenchmarkRandomNodes-4 1830 172 -90.60% benchmark old allocs new allocs delta BenchmarkRandomNodes-4 16 1 -93.75% benchmark old bytes new bytes delta BenchmarkRandomNodes-4 535 48 -91.03% Signed-off-by: Flavio Crisciani <[email protected]>

ddebroy

LGTM with a couple of comments.

ddebroy · 2018-01-05T23:22:22Z

networkdb/cluster.go


+	var i int
+	for node := range nodes {


I think go does not guarantee the range is random as a spec. Will there be any side effects if a certain implementation of Go returns predictable range? Also is the impact of calling randomOffet rather significant?

I have to check on that, I read that since go 1.0 the keys were randomized, but have to check is that is a common assumption for all the architectures, or at least the one that we support.
Yes the randomOffset is the bottlenek, that can be seen from the detailed flame graph, the rand.Int has 90% and 10% is the big.NewInt

Ah I see. This thread has some details: https://groups.google.com/forum/#!topic/golang-nuts/zBcqMsDNt7Q . I guess it is safe to assume the randomness if there is a major perf gain and just have a comment about the assumption.

It's not cryptographically random and also the test highlighted some unbalance in the results, but the test also makes sure that there is no node that never comes up in the selection (min == 0 condition)
Another approach that I was thinking about is to have the original string slice and an index that is saved with the network and loop on the slice in a circular buffer fashion. Considering that nodes not change every second, that should guarantee fairness in the selection and the randomness is guaranteed at insertion time (on the base of when the node join). The problem of the slice still remains the linear loop for each insertion and deletion that is pretty lame to pay.

ddebroy · 2018-01-05T23:28:23Z

networkdb/networkdb.go

@@ -64,8 +64,8 @@ type NetworkDB struct {
 	networks map[string]map[string]*network

 	// A map of nodes which are participating in a given
-	// network. The key is a network ID.
-	networkNodes map[string][]string
+	// network. The first key is a network ID. The second key is


Is the comment "second key is" incomplete?

yep looks like got lost

kevtainer · 2018-01-11T00:10:40Z

@fcrisciani will this change potentially solve the limitation on the /24 cidr block for overlay/vip network? i can link the issue for reference (if needed, but I didn't want to cross-pollinate unless it's related)

fcrisciani · 2018-01-11T00:16:09Z

@KCrawley actually this is still pretty experimental and I think I will do other changes before having this one ready.
This won't make more scalable ingress because there the bottleneck is the number of IPVS rules that have to be configured into the containers. For that there is other work that is being done that will reduce the complexity from O(n^2) to O(n)

fcrisciani force-pushed the performance branch from 213939e to 3ca0c60 Compare January 5, 2018 22:45

fcrisciani force-pushed the performance branch from 3ca0c60 to 6ba12b1 Compare January 5, 2018 23:15

ddebroy approved these changes Jan 5, 2018

View reviewed changes

fcrisciani changed the title ~~NetworkDB performance improvements~~ [WIP] NetworkDB performance improvements Jan 5, 2018

corhere added the carry-to-mobymoby label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] NetworkDB performance improvements #2046

[WIP] NetworkDB performance improvements #2046

fcrisciani commented Jan 5, 2018 •

edited

Loading

codecov-io commented Jan 5, 2018 •

edited

Loading

ddebroy left a comment

ddebroy Jan 5, 2018

fcrisciani Jan 5, 2018 •

edited

Loading

ddebroy Jan 5, 2018

fcrisciani Jan 5, 2018

ddebroy Jan 5, 2018

fcrisciani Jan 5, 2018

kevtainer commented Jan 11, 2018

fcrisciani commented Jan 11, 2018

[WIP] NetworkDB performance improvements #2046

Are you sure you want to change the base?

[WIP] NetworkDB performance improvements #2046

Conversation

fcrisciani commented Jan 5, 2018 • edited Loading

codecov-io commented Jan 5, 2018 • edited Loading

Codecov Report

ddebroy left a comment

Choose a reason for hiding this comment

ddebroy Jan 5, 2018

Choose a reason for hiding this comment

fcrisciani Jan 5, 2018 • edited Loading

Choose a reason for hiding this comment

ddebroy Jan 5, 2018

Choose a reason for hiding this comment

fcrisciani Jan 5, 2018

Choose a reason for hiding this comment

ddebroy Jan 5, 2018

Choose a reason for hiding this comment

fcrisciani Jan 5, 2018

Choose a reason for hiding this comment

kevtainer commented Jan 11, 2018

fcrisciani commented Jan 11, 2018

fcrisciani commented Jan 5, 2018 •

edited

Loading

codecov-io commented Jan 5, 2018 •

edited

Loading

fcrisciani Jan 5, 2018 •

edited

Loading