Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd3 - slow watchers and watcher count blow up #8387

Closed
mwf opened this issue Aug 10, 2017 · 9 comments
Closed

etcd3 - slow watchers and watcher count blow up #8387

mwf opened this issue Aug 10, 2017 · 9 comments

Comments

@mwf
Copy link

mwf commented Aug 10, 2017

etcd server version 3.1.5

Hi guys!
We hit strange issue yesterday, please take a look at this screenshots from Grafana:
1
2

Sorry, I had to clean-up host names, otherwise our security team could kill me :)
So, for some unknown reason the number of watchers started to increase and CPU usage increased. The count of watch streams remained stable, but some "slow watchers" appeared.

Could you please explain what's a "watcher" ("etcd_debugging_mvcc_watcher_total" metrics) in terms of MVCC storage, what's the connection with watch streams (I assume watch stream equals the count of client.Watch() calls?) and what could lead to such behaviour?
The numbers are scaring - 32 million watchers O___O

Everything stabilized by itself. There were several etcd restarts, the last one at 23:20. As you can see on screenshot below - the number of watchers still grew after the restart, but then decreased several times and got normal o___O How did it resolve? What was the core problem?

3

Also some services hit the issue of Watch freeze - they couldn't receive the key updates like in #7247 and other connected issues you have.

@heyitsanthony
Copy link
Contributor

  • watcher_total - total number of watches (equal to number of watches open via clientv3.Watch())
  • watch_stream_total - total number of watch streams (should be less than or equal to watcher_total)
  • slow_watcher_total - total watches with pending events

Is there a custom application using etcd? It could be leaking watches. Client side metrics may help with debugging; see https://godoc.org/github.com/coreos/etcd/clientv3 "Example (metrics)"

@mwf
Copy link
Author

mwf commented Aug 10, 2017

I think I got it, thanks!
So if a client watches for a single prefix (or key) it will create 1 watcher and 1 watch stream. And if it starts another watch - it creates +1 watcher, but stream remains the same?
Or every instance of clientv3.Watcher creates a new stream? So if app does 2 clientv3.NewWatcher() calls - there will be 2 streams from one app?
So simply, watch_stream_total - is total number of watchers, and watcher_total - is total number of various watch calls?

slow_watcher_total - pending events, you mean the events which occured in server and waited in TCP (or some other) queue for some time (>1 sec for example)? So when the event reaches the client it decrements?

etcd_debugging_mvcc_pending_events_total - what is this metric about then? Which events? :)
etcd_debugging_mvcc_range_total - and what is this metric about? Pending key range requests being processed? Per minute or how often?

BTW do I understand correctly that etcd_debugging_store* metrics are about etcd2 store and etcd_debugging_mvcc* are about etcd3 store?

Is there a custom application using etcd? It could be leaking watches.

Yes, of course. Our apps watch etcd for service discovery.
Yes, we had some errors with Watch that time, as I said. Some watches got stuck and could not receive events.
And we had a bunch of watches, recreated every second, receiving etcdserver: mvcc: required revision has been compacted. Maybe it's the core reason.

So here is the main question - watcher.Watch(ctx, ....) is called, we get last event with watchResponse.Err() != nil and the channel gets closed next. Do we need to do anything additionally? I mean is the etcd watcher closed automatically here? I think this is the reason of watcher leak.
We don't call watcher.Close() (we don't want to affect other watches of the client), we don't close ctx (because we don't need to, we are going to re-create the watch). Are we supposed to do any of this??
Remember, server version is 3.1.5, client version is 3.1.8 if I'm not mistaken.

Client side metrics may help with debugging;

Yeap, thank you very much for your help! I will try to integrate the metrics, they will be usefull I believe.

Sorry for so many questions :)

@heyitsanthony
Copy link
Contributor

Or every instance of clientv3.Watcher creates a new stream? So if app does 2 clientv3.NewWatcher() calls - there will be 2 streams from one app?

The client keys streams by the ctx string serialization in 3.1; same ctx, same stream. It will only use grpc metadata from the ctx in 3.3. Separate Watcher objects will create separate streams.

So simply, watch_stream_total - is total number of watchers, and watcher_total - is total number of various watch calls?

No. watch_stream_total is approximately the total number of watch grpc streams on the server. watcher_total is the sum of all watches across all watch grpc streams.

slow_watcher_total - pending events, you mean the events which occured in server and waited in TCP (or some other) queue for some time (>1 sec for example)? So when the event reaches the client it decrements?

It's related to whether the watch has posted all events to its channel. If the server can't drain events to the client fast enough, the watch is classified as 'slow' and the handling reverts to a batching mode. See https://github.com/coreos/etcd/blob/master/mvcc/watchable_store.go#L184.

BTW do I understand correctly that etcd_debugging_store* metrics are about etcd2 store and etcd_debugging_mvcc* are about etcd3 store?

Yes. See the store/metrics.go and mvcc/metrics.go

etcd_debugging_mvcc_pending_events_total - what is this metric about then? Which events? :)

Total events that have been posted to the watch channel but have not been sent over grpc. See mvcc/watchable_store.go and etcdserver/api/v3rpc/watch.go

So here is the main question - watcher.Watch(ctx, ....) is called, we get last event with watchResponse.Err() != nil and the channel gets closed next. Do we need to do anything additionally? I mean is the etcd watcher closed automatically here?

No, the server resources should be released automatically on a compaction. If not, there's a bug.

@mwf
Copy link
Author

mwf commented Aug 10, 2017

Thanks for metrics explanation 👍

No, the server resources should be released automatically on a compaction. If not, there's a bug.

It seems they were really released on compaction in our case - we compact every 24 hours, and it's done somewhen in the night.

So OK, let's clarify. The server resources are released:

  • on compaction;
  • on close event from client (either when you cancel the context or call Close() method);

No other options? Why aren't they released when server responds with some error (compaction error in my example) and the watch channel is going to be closed? Is this a bug or a kind of undocumented feature?

@heyitsanthony
Copy link
Contributor

heyitsanthony commented Aug 10, 2017

I meant on a compaction error. If the clientv3 watch channel closes then the resources should be freed automatically by the clientv3 watcher implementation.

@mwf
Copy link
Author

mwf commented Aug 10, 2017

OK, I'll check this case and tell you the results.
Maybe you've already fixed it, 3.1.5 is almost 5 months old.

@heyitsanthony
Copy link
Contributor

@mwf ping?

@mwf
Copy link
Author

mwf commented Aug 30, 2017

@heyitsanthony I'm deeply sorry, I had some other important stuff to do all this days :(
But we hit it again this weekend (and again because of some network problems) and I'll try to reproduce it finally, to understand what we are dealing with and if the problem exists in newer releases.
I hope to contact you this Friday.

@mwf
Copy link
Author

mwf commented Sep 23, 2017

@heyitsanthony sorry for taking so long - I had some personal problems during vacation, doesn't matter.

I played with server and client code, and couldn't reproduce it in the same way we hit it :(
What I achieved - is increasing watchers count if application code does not gracefully shutdown the Watcher via canceling the context. But it's done explicitly in my lib I wrote o_O And I have timeouts for re-creating watcher in case of network partitioning and other failures.

So it looks like someone spams the server in for-loop with new Watch requests without canceling old ones with some tremendous speed :(

It doesn't look like server bug, so I'm closing the issue, if I manage to find something else I will let you know.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants