Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: don't reschedule idle timer if it is already active #4297

Merged
merged 7 commits into from
Apr 4, 2018

Conversation

carl-mastrangelo
Copy link
Contributor

@ejona86 @zhangkun83

I haven't quite got the tests to pass yet but I think this is close enough for review to take a look. The main idea is to avoid rescheduling the IdleModeTimer if it is already active. This has pretty good results on my machine for the latency benchmarks. With CPU freq scaling on, median latency goes from 197us to 185us. (no TLS, but with census enabled, direct executor).

The changes to get the tests to work are going to be harder since this change depends on System.nanoTime. The FakeClock we use in our tests doesn't make it easy to mock this out, so I'm working on ideas to fix this. That said, it is starting to get invasive for tests to pass.

Raw numbers from my runs:

Before (w/freq scaling)
50.0%ile Latency (in nanos):		197071
90.0%ile Latency (in nanos):		253383
95.0%ile Latency (in nanos):		272479
99.0%ile Latency (in nanos):		318191
99.9%ile Latency (in nanos):		411519
100.0%ile Latency (in nanos):		16083967
QPS:                           5005


After:
50.0%ile Latency (in nanos):		185023
90.0%ile Latency (in nanos):		240671
95.0%ile Latency (in nanos):		259319
99.0%ile Latency (in nanos):		299887
99.9%ile Latency (in nanos):		379855
100.0%ile Latency (in nanos):		11492351
QPS:                           5320
Before (w/o freq scaling)

50.0%ile Latency (in nanos):		63907
90.0%ile Latency (in nanos):		73055
95.0%ile Latency (in nanos):		79443
99.0%ile Latency (in nanos):		93739
99.9%ile Latency (in nanos):		123583
100.0%ile Latency (in nanos):		14028287
QPS:                           14936

After:
After:
50.0%ile Latency (in nanos):		57355
90.0%ile Latency (in nanos):		69459
95.0%ile Latency (in nanos):		78495
99.0%ile Latency (in nanos):		93575
99.9%ile Latency (in nanos):		126819
100.0%ile Latency (in nanos):		12634111
QPS:                           16435

}

/** Time source representing nanoseconds since fixed but arbitrary point in time. */
interface Ticker {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Guava's ticker is an abstract class, and marked @Beta. I can't use it here, so I define an interface.

Copy link
Member

@ejona86 ejona86 Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I toyed with this some (just now) using Stopwatch. I think it ends up fine (no unnecessary now() calls in the normal paths), and it is a stable API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using stopwatch.

Copy link
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good with minor comments.

@@ -293,7 +293,7 @@ public void onPingTimeout() {
public abstract long read();
}

private static class SystemTicker extends Ticker {
static class SystemTicker extends Ticker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, this was originally going to be reused.

private long runAt;
private boolean enabled;

Rescheduler(Runnable r, ChannelExecutor exec, ScheduledExecutorService scheduler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way to decouple Rescheduler from ChannelExecutor. Pass an Executor here and make it an requirement that it must serialize its runnables and rechedule(). In ManagedChannelImpl, you could make an Executor that delegates to ChannelExecutor. I am strong for splitting this class out of ManagedChannelImpl, which is good for test-ability and also for the fitness of ManagedChannelImpl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@carl-mastrangelo carl-mastrangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

private long runAt;
private boolean enabled;

Rescheduler(Runnable r, ChannelExecutor exec, ScheduledExecutorService scheduler) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/** Time source representing nanoseconds since fixed but arbitrary point in time. */
interface Ticker {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to using stopwatch.

@@ -293,7 +293,7 @@ public void onPingTimeout() {
public abstract long read();
}

private static class SystemTicker extends Ticker {
static class SystemTicker extends Ticker {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, this was originally going to be reused.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please still wait for @zhangkun83's review.

@@ -317,4 +317,4 @@ public long currentTimeMillis() {
*/
boolean shouldAccept(Runnable runnable);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert changes to this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return ((FutureRunnable) r).rescheduler.enabled;
}

private long nanoTime() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you did there. 😄

}
}

private static final class FutureRunnable implements Runnable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why is this static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ManagedChannelImplIdlenessTest.java there is a test that checks there are no more scheduled tasks in the scheduler. It fails because the tasks are still left over, but now disabled. I wanted to make it so the test scans the tasks and checks to see they are Rescheduler Runnables, and that they are disabled.

There needs to be a reference from the Runnable back to the Rescheduler, so this became static and took an explicit reference to the outer class. This is so isEnabled below works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. You could have added another method to FutureRunnable to do something like return Rescheduler.this, but that's close to the same amount of effort. Makes sense.

assertFalse(runner.ran);
rescheduler.reschedule(1, TimeUnit.NANOSECONDS);
assertFalse(runner.ran);
rescheduler.cancel(/* permanent= */ false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need a test for cancel(true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

rescheduler.reschedule(1, TimeUnit.NANOSECONDS);
assertFalse(runner.ran);
assertFalse(exec.executed);
rescheduler.reschedule(50, TimeUnit.NANOSECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are keeping the flexibility of rescheduling with a shorter delay, you should cover it, or remove the flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@carl-mastrangelo carl-mastrangelo merged commit 9ed8425 into grpc:master Apr 4, 2018
@carl-mastrangelo carl-mastrangelo deleted the resched branch April 4, 2018 23:02
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
@carl-mastrangelo carl-mastrangelo restored the resched branch August 17, 2019 01:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants