fix todo [tls cache] #13902

ls-2018 · 2022-04-07T10:07:42Z

fix
// TODO: cache

Signed-off-by: ls-2018 <[email protected]>

codecov-commenter · 2022-04-07T10:41:46Z

Codecov Report

Merging #13902 (4a62e55) into main (f341b95) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #13902      +/-   ##
==========================================
- Coverage   72.40%   72.35%   -0.05%     
==========================================
  Files         469      469              
  Lines       38414    38430      +16     
==========================================
- Hits        27813    27806       -7     
- Misses       8812     8832      +20     
- Partials     1789     1792       +3

Flag	Coverage Δ
all	`72.35% <ø> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
client/v3/experimental/recipes/priority_queue.go	`56.66% <0.00%> (-10.00%)`	⬇️
client/pkg/v3/tlsutil/tlsutil.go	`83.33% <0.00%> (-8.34%)`	⬇️
raft/rafttest/node.go	`95.00% <0.00%> (-5.00%)`	⬇️
server/etcdserver/api/v3rpc/interceptor.go	`73.43% <0.00%> (-4.17%)`	⬇️
client/pkg/v3/testutil/recorder.go	`76.27% <0.00%> (-3.39%)`	⬇️
client/v3/leasing/util.go	`91.66% <0.00%> (-3.34%)`	⬇️
server/etcdserver/api/v3rpc/util.go	`70.96% <0.00%> (-3.23%)`	⬇️
client/v3/experimental/recipes/key.go	`75.34% <0.00%> (-2.74%)`	⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go	`69.56% <0.00%> (-1.74%)`	⬇️
server/lease/leasehttp/http.go	`64.23% <0.00%> (-1.46%)`	⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f341b95...4a62e55. Read the comment docs.

ptabor

Thank you

serathius

I know there is a todo, but I don't think we should assume that the author of it has really though through that cache here is needed. Correctly implemented cache allows us to trade Disk IO & cpu for memory. However it's not always obvious that cache is beneficial, depending on hit vs miss ratio it might lead to lowering performance or resource leaking.

There are two things we should definetly do before merging:

Add tests
Guarantee maximum size of cache (implement LRU cache)

ptabor · 2022-04-10T10:05:26Z

I know there is a todo, but I don't think we should assume that the author of it has really though through that cache here is needed. Correctly implemented cache allows us to trade Disk IO & cpu for memory. However it's not always obvious that cache is beneficial, depending on hit vs miss ratio it might lead to lowering performance or resource leaking.

There are two things we should definetly do before merging:

Add tests

Guarantee maximum size of cache (implement LRU cache)

I agree that LRU based implementation (ideally with O(1min)) expiration would be better.

Few observations:

I think this optimisation might be pretty important in case we have a lot of new connections.
The current model assumes re-read of the certificate from file when each connection is being created:

etcd/client/pkg/transport/listener_tls.go

Line 155 in 0c9a4e0

if err := l.check(ctx, tlsConn); err != nil {

(and not what intuition says: "when listener is being created").
If we have storm of connections, e.g. after server restart... it might participate significantly both: in terms of time and memory when we need to reread the certs.

This code is in client/pkg, but it's about opening 'listeners' that is used only on server side, e.g.

etcd/server/embed/etcd.go

Lines 530 to 534 in 0c9a4e0

    
           peers[i].Listener, err = transport.NewListenerWithOpts(u.Host, u.Scheme, 
        
           	transport.WithTLSInfo(&cfg.PeerTLSInfo), 
        
           	transport.WithSocketOpts(&cfg.SocketOpts), 
        
           	transport.WithTimeout(rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout), 
        
           )

. So in practice the configuration of the file-names does not change frequently/ever so in practical application I expect 'fixed' O(1-4) certs being loaded.
In theory some application of embed server might keep creating the server's within a single process with different configs... potentially over-feeling the cache.

There is aspect of 'impact' of substituting a file in place (while server is running) with 'refreshed' certificate. In theory it used to be 'working' before the PR, and with this change, the server would stick to the original content. Thus invalidating the cache with O(1-10 min) min resolution would be ideal.

Summary:

Introducing caching is IMHO justified. Proving an impact on perf during connection storm would be an appreciated exercise.
Using some existing lro implementation (hashicorp ?) would make the code safer and is recommended.

ls-2018 · 2022-04-10T10:09:57Z

I agree with the LRU based implementation and I will implement it next

Signed-off-by: ls-2018 <[email protected]>

ptabor

Thank you for the fix.

I wonder whether we can avoid maintaining own implementation of LRU.
E.g. usage of https://github.com/hashicorp/golang-lru with a wrapper that
substitutes expired entries might be suffient and imply less maintainance cost.

I've seen also https://github.com/hnlq715/golang-lru, but it's a fork without it's own brand, so would be misleading to depend on it.

ptabor · 2022-04-11T07:58:12Z

client/pkg/lruutil/lru_cache.go

@@ -0,0 +1,125 @@
+// Copyright 2022 Google LLC


Signed-off-by: ls-2018 <[email protected]>

ls-2018 · 2022-04-11T08:21:34Z

@ptabor I've seen https://github.com/hnlq715/golang-lru. but it's not support time-based elimination strategies.

action fails just by changing a copyright

ptabor · 2022-04-13T08:54:12Z

@ptabor I've seen https://github.com/hnlq715/golang-lru. but it's not support time-based elimination strategies.

If we take any existing LRO, we can simulate expiration on fetch.

	if v, ok := crlBytesLru.Get(crlPath); ok {
		if v.expiration.Before(time.now()) {
		  return crlBytes.value
		}
	} 
	crlBytes, err := ioutil.ReadFile(crlPath)
	if err != nil {
		return err
	}
	crlBytesLru.Set(crlPath, expirable{value:crlBytes, expiration:Time.now()+ttl})
	}

Even better wrapped in a function:

memoize(lru, crlPath, ttl, func() { return ioutil.ReadFile(crlPath) } )

action fails just by changing a copyright
likely flake. Restarted.

Signed-off-by: ls-2018 <[email protected]>

tls cache

84c4f79

Signed-off-by: ls-2018 <[email protected]>

ptabor approved these changes Apr 8, 2022

View reviewed changes

ls-2018 requested a review from ptabor April 8, 2022 07:24

serathius requested changes Apr 8, 2022

View reviewed changes

ls-2018 added 3 commits April 10, 2022 20:58

lru cache

aa7547f

Signed-off-by: ls-2018 <[email protected]>

cpu limit

5d54dcb

Signed-off-by: ls-2018 <[email protected]>

fix

ff01a6a

Signed-off-by: ls-2018 <[email protected]>

ptabor suggested changes Apr 11, 2022

View reviewed changes

ptabor reviewed Apr 11, 2022

View reviewed changes

fix copyright

d67ff8c

Signed-off-by: ls-2018 <[email protected]>

ls-2018 added 12 commits April 13, 2022 17:41

use golang-lru

8605a85

Signed-off-by: ls-2018 <[email protected]>

use copyright

0367805

Signed-off-by: ls-2018 <[email protected]>

upadte go.mod

2c648d4

Signed-off-by: ls-2018 <[email protected]>

upadte go.mod

8255f3c

Signed-off-by: ls-2018 <[email protected]>

upadte go.mod

4173257

Signed-off-by: ls-2018 <[email protected]>

Merge branch 'main' into cache

3b4568b

upadte bill-of-materials.json

40ec389

Signed-off-by: ls-2018 <[email protected]>

Merge branch 'cache' of https://github.com/ls-2018/etcd into cache

4a62e55

Merge branch 'main' into cache

903d873

fix dep

8e38199

Signed-off-by: ls-2018 <[email protected]>

fix codeql

fb919f6

Signed-off-by: ls-2018 <[email protected]>

fix codeql

95c5ecb

Signed-off-by: ls-2018 <[email protected]>

ls-2018 closed this Apr 14, 2022

ptabor mentioned this pull request May 7, 2022

ETCD doesn't automatically load changes to ca bundles for peer-trusted-ca-file or trusted-ca-file #11555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix todo [tls cache] #13902

fix todo [tls cache] #13902

ls-2018 commented Apr 7, 2022

codecov-commenter commented Apr 7, 2022 •

edited

Loading

ptabor left a comment

serathius left a comment

ptabor commented Apr 10, 2022

ls-2018 commented Apr 10, 2022

ptabor left a comment

ptabor Apr 11, 2022 •

edited

Loading

ls-2018 commented Apr 11, 2022 •

edited

Loading

ptabor commented Apr 13, 2022 •

edited

Loading

fix todo [tls cache] #13902

fix todo [tls cache] #13902

Conversation

ls-2018 commented Apr 7, 2022

codecov-commenter commented Apr 7, 2022 • edited Loading

Codecov Report

ptabor left a comment

Choose a reason for hiding this comment

serathius left a comment

Choose a reason for hiding this comment

ptabor commented Apr 10, 2022

ls-2018 commented Apr 10, 2022

ptabor left a comment

Choose a reason for hiding this comment

ptabor Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

ls-2018 commented Apr 11, 2022 • edited Loading

ptabor commented Apr 13, 2022 • edited Loading

codecov-commenter commented Apr 7, 2022 •

edited

Loading

ptabor Apr 11, 2022 •

edited

Loading

ls-2018 commented Apr 11, 2022 •

edited

Loading

ptabor commented Apr 13, 2022 •

edited

Loading