-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backend: print snapshotting duration warning every 30s #7877
Conversation
mvcc/backend/backend.go
Outdated
plog.Warningf("snapshotting is taking more than %v to finish", snapshotWarningTimeout) | ||
case <-stopc: | ||
} | ||
plog.Infof("snapshotting took %v", time.Since(now)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we track that somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the snapshot is a long running one, then how long it takes is not important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a worthwhile backend prometheus metric
mvcc/backend/backend.go
Outdated
defer close(donec) | ||
now := time.Now() | ||
select { | ||
case <-time.After(snapshotWarningTimeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put this into a for loop and printout the warning continuously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we print the warning every 30s? or once time passes 30 seconds. print warning indefinitely until close() is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every 30s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about the following format.
"snapshotting is taking more than 30s to finish"
"snapshotting is taking more than 60s to finish"
"snapshotting is taking more than 90s to finish"
...
@xiang90 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. probably also log when it starts to differentiate concurrent ones.
snapshot is taking more than ... [started at ]
17b7eab
to
d15a0a8
Compare
All fixed. PTAL @xiang90 |
mvcc/backend/backend.go
Outdated
@@ -163,6 +165,21 @@ func (b *backend) ForceCommit() { | |||
} | |||
|
|||
func (b *backend) Snapshot() Snapshot { | |||
// tracks duration of snapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment.
d15a0a8
to
76e4a8d
Compare
all fixed. PTAL |
lgtm. @heyitsanthony has a good point on the metrics. @fanminshi can you also add a metrics for the snapshot duration? like https://github.com/coreos/etcd/blob/master/mvcc/backend/metrics.go#L20 in another pr? |
@xiang90 sure, should i do that in a separate pr? |
mvcc/backend/backend.go
Outdated
stopc, donec := make(chan struct{}), make(chan struct{}) | ||
go func() { | ||
defer close(donec) | ||
now := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start := time.Now()
mvcc/backend/backend.go
Outdated
for { | ||
select { | ||
case <-time.After(snapshotWarningTimeout): | ||
plog.Warningf("snapshotting is taking more than %v to finish [started at %v]", time.Since(now).Seconds(), now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time.Since(start), start
%v will write out time format correctly i believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshotting is taking more than 4.014502321 to finish [started at 2017-05-04 15:18:56.157129795 -0700 PDT]
yes. separate pr. |
76e4a8d
to
bc6cef3
Compare
mvcc/backend/backend.go
Outdated
for { | ||
select { | ||
case <-time.After(snapshotWarningTimeout): | ||
plog.Warningf("snapshotting is taking more than %v seconds to finish [started at %v]", time.Since(start).Seconds(), start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really recommend printing out the total time taken in addition to this warning, otherwise one can't tell the difference between 31 and 301 seconds...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the total.
see example here:
"snapshotting is taking more than 30s to finish"
"snapshotting is taking more than 60s to finish"
"snapshotting is taking more than 90s to finish"
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I missed that it was called in a loop.
- suggest use time.NewTicker instead of repeated time.After (less garbage) (don't forget to stop it when done)
- Still suggest printing the total time on completion ("snapshotting took 65.3s") but it's less important than I thought initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this with time.NewTicker
.
I am going to have another pr for track snapshotting duration as a Prometheus metric, #7878.
So probably no need to printing total duration.
merging when green. |
bc6cef3
to
543b26a
Compare
all fixed. Used Ticker instead of time.After() |
go func() { | ||
defer close(donec) | ||
start := time.Now() | ||
ticker := time.NewTicker(snapshotWarningTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer ticker.Stop()
is more future proof, although there's no bug here at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alrighty. will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
543b26a
to
f7f30f2
Compare
all fixed. ptal |
merging when green. |
FIXES #7870