Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix todo [tls cache] #13902

Closed
wants to merge 17 commits into from
Closed

fix todo [tls cache] #13902

wants to merge 17 commits into from

Conversation

ls-2018
Copy link
Contributor

@ls-2018 ls-2018 commented Apr 7, 2022

fix
// TODO: cache

Signed-off-by: ls-2018 <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Apr 7, 2022

Codecov Report

Merging #13902 (4a62e55) into main (f341b95) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main   #13902      +/-   ##
==========================================
- Coverage   72.40%   72.35%   -0.05%     
==========================================
  Files         469      469              
  Lines       38414    38430      +16     
==========================================
- Hits        27813    27806       -7     
- Misses       8812     8832      +20     
- Partials     1789     1792       +3     
Flag Coverage Δ
all 72.35% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/v3/experimental/recipes/priority_queue.go 56.66% <0.00%> (-10.00%) ⬇️
client/pkg/v3/tlsutil/tlsutil.go 83.33% <0.00%> (-8.34%) ⬇️
raft/rafttest/node.go 95.00% <0.00%> (-5.00%) ⬇️
server/etcdserver/api/v3rpc/interceptor.go 73.43% <0.00%> (-4.17%) ⬇️
client/pkg/v3/testutil/recorder.go 76.27% <0.00%> (-3.39%) ⬇️
client/v3/leasing/util.go 91.66% <0.00%> (-3.34%) ⬇️
server/etcdserver/api/v3rpc/util.go 70.96% <0.00%> (-3.23%) ⬇️
client/v3/experimental/recipes/key.go 75.34% <0.00%> (-2.74%) ⬇️
server/etcdserver/api/rafthttp/msgappv2_codec.go 69.56% <0.00%> (-1.74%) ⬇️
server/lease/leasehttp/http.go 64.23% <0.00%> (-1.46%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f341b95...4a62e55. Read the comment docs.

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@ls-2018 ls-2018 requested a review from ptabor April 8, 2022 07:24
Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there is a todo, but I don't think we should assume that the author of it has really though through that cache here is needed. Correctly implemented cache allows us to trade Disk IO & cpu for memory. However it's not always obvious that cache is beneficial, depending on hit vs miss ratio it might lead to lowering performance or resource leaking.

There are two things we should definetly do before merging:

  • Add tests
  • Guarantee maximum size of cache (implement LRU cache)

@ptabor
Copy link
Contributor

ptabor commented Apr 10, 2022

I know there is a todo, but I don't think we should assume that the author of it has really though through that cache here is needed. Correctly implemented cache allows us to trade Disk IO & cpu for memory. However it's not always obvious that cache is beneficial, depending on hit vs miss ratio it might lead to lowering performance or resource leaking.

There are two things we should definetly do before merging:

  • Add tests
  • Guarantee maximum size of cache (implement LRU cache)

I agree that LRU based implementation (ideally with O(1min)) expiration would be better.

Few observations:

  1. I think this optimisation might be pretty important in case we have a lot of new connections.
    The current model assumes re-read of the certificate from file when each connection is being created:

    if err := l.check(ctx, tlsConn); err != nil {

    (and not what intuition says: "when listener is being created").
    If we have storm of connections, e.g. after server restart... it might participate significantly both: in terms of time and memory when we need to reread the certs.

  2. This code is in client/pkg, but it's about opening 'listeners' that is used only on server side, e.g.

    etcd/server/embed/etcd.go

    Lines 530 to 534 in 0c9a4e0

    peers[i].Listener, err = transport.NewListenerWithOpts(u.Host, u.Scheme,
    transport.WithTLSInfo(&cfg.PeerTLSInfo),
    transport.WithSocketOpts(&cfg.SocketOpts),
    transport.WithTimeout(rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout),
    )
    . So in practice the configuration of the file-names does not change frequently/ever so in practical application I expect 'fixed' O(1-4) certs being loaded.
    In theory some application of embed server might keep creating the server's within a single process with different configs... potentially over-feeling the cache.

  3. There is aspect of 'impact' of substituting a file in place (while server is running) with 'refreshed' certificate. In theory it used to be 'working' before the PR, and with this change, the server would stick to the original content. Thus invalidating the cache with O(1-10 min) min resolution would be ideal.

Summary:

  • Introducing caching is IMHO justified. Proving an impact on perf during connection storm would be an appreciated exercise.
  • Using some existing lro implementation (hashicorp ?) would make the code safer and is recommended.

@ls-2018
Copy link
Contributor Author

ls-2018 commented Apr 10, 2022

I agree with the LRU based implementation and I will implement it next

Signed-off-by: ls-2018 <[email protected]>
Signed-off-by: ls-2018 <[email protected]>
Signed-off-by: ls-2018 <[email protected]>
Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix.

I wonder whether we can avoid maintaining own implementation of LRU.
E.g. usage of https://github.com/hashicorp/golang-lru with a wrapper that
substitutes expired entries might be suffient and imply less maintainance cost.

I've seen also https://github.com/hnlq715/golang-lru, but it's a fork without it's own brand, so would be misleading to depend on it.

@@ -0,0 +1,125 @@
// Copyright 2022 Google LLC
Copy link
Contributor

@ptabor ptabor Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we use: // Copyright 2022 The etcd Authors

Signed-off-by: ls-2018 <[email protected]>
@ls-2018
Copy link
Contributor Author

ls-2018 commented Apr 11, 2022

@ptabor I've seen https://github.com/hnlq715/golang-lru. but it's not support time-based elimination strategies.

action fails just by changing a copyright

@ptabor
Copy link
Contributor

ptabor commented Apr 13, 2022

@ptabor I've seen https://github.com/hnlq715/golang-lru. but it's not support time-based elimination strategies.

If we take any existing LRO, we can simulate expiration on fetch.

	if v, ok := crlBytesLru.Get(crlPath); ok {
		if v.expiration.Before(time.now()) {
		  return crlBytes.value
		}
	} 
	crlBytes, err := ioutil.ReadFile(crlPath)
	if err != nil {
		return err
	}
	crlBytesLru.Set(crlPath, expirable{value:crlBytes, expiration:Time.now()+ttl})
	}

Even better wrapped in a function:

memoize(lru, crlPath, ttl, func() { return ioutil.ReadFile(crlPath) } )

action fails just by changing a copyright
likely flake. Restarted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants