Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: print snapshotting duration warning every 30s #7877

Merged
merged 1 commit into from
May 5, 2017

Conversation

fanminshi
Copy link
Member

FIXES #7870

plog.Warningf("snapshotting is taking more than %v to finish", snapshotWarningTimeout)
case <-stopc:
}
plog.Infof("snapshotting took %v", time.Since(now))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we track that somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the snapshot is a long running one, then how long it takes is not important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a worthwhile backend prometheus metric

defer close(donec)
now := time.Now()
select {
case <-time.After(snapshotWarningTimeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put this into a for loop and printout the warning continuously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we print the warning every 30s? or once time passes 30 seconds. print warning indefinitely until close() is called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every 30s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about the following format.

"snapshotting is taking more than 30s to finish"
"snapshotting is taking more than 60s to finish"
"snapshotting is taking more than 90s to finish"
...

@xiang90 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. probably also log when it starts to differentiate concurrent ones.

snapshot is taking more than ... [started at ]

@fanminshi
Copy link
Member Author

All fixed. PTAL @xiang90

@@ -163,6 +165,21 @@ func (b *backend) ForceCommit() {
}

func (b *backend) Snapshot() Snapshot {
// tracks duration of snapshot()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment.

@fanminshi
Copy link
Member Author

all fixed. PTAL

@xiang90
Copy link
Contributor

xiang90 commented May 4, 2017

lgtm.

@heyitsanthony has a good point on the metrics.

@fanminshi can you also add a metrics for the snapshot duration?

like https://github.com/coreos/etcd/blob/master/mvcc/backend/metrics.go#L20 in another pr?

@fanminshi
Copy link
Member Author

@xiang90 sure, should i do that in a separate pr?

stopc, donec := make(chan struct{}), make(chan struct{})
go func() {
defer close(donec)
now := time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start := time.Now()

for {
select {
case <-time.After(snapshotWarningTimeout):
plog.Warningf("snapshotting is taking more than %v to finish [started at %v]", time.Since(now).Seconds(), now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Since(start), start

%v will write out time format correctly i believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshotting is taking more than 4.014502321 to finish [started at 2017-05-04 15:18:56.157129795 -0700 PDT]

@xiang90
Copy link
Contributor

xiang90 commented May 4, 2017

@xiang90 sure, should i do that in a separate pr?

yes. separate pr.

for {
select {
case <-time.After(snapshotWarningTimeout):
plog.Warningf("snapshotting is taking more than %v seconds to finish [started at %v]", time.Since(start).Seconds(), start)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really recommend printing out the total time taken in addition to this warning, otherwise one can't tell the difference between 31 and 301 seconds...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the total.

see example here:

"snapshotting is taking more than 30s to finish"
"snapshotting is taking more than 60s to finish"
"snapshotting is taking more than 90s to finish"
...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I missed that it was called in a loop.

  • suggest use time.NewTicker instead of repeated time.After (less garbage) (don't forget to stop it when done)
  • Still suggest printing the total time on completion ("snapshotting took 65.3s") but it's less important than I thought initially.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this with time.NewTicker.

I am going to have another pr for track snapshotting duration as a Prometheus metric, #7878.

So probably no need to printing total duration.

@fanminshi fanminshi changed the title backend: tracks duration of snapshotting and print warning if run > 30s backend: print snapshotting duration warning every 30s May 4, 2017
@fanminshi
Copy link
Member Author

merging when green.

@fanminshi
Copy link
Member Author

all fixed. Used Ticker instead of time.After()

go func() {
defer close(donec)
start := time.Now()
ticker := time.NewTicker(snapshotWarningTimeout)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer ticker.Stop() is more future proof, although there's no bug here at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alrighty. will change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fanminshi
Copy link
Member Author

all fixed. ptal

@fanminshi
Copy link
Member Author

merging when green.

@fanminshi fanminshi merged commit 1f8de1a into etcd-io:master May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants