Go to "permanent error mode" if ChannelExecutor throws exception #3293

dapengzhang0 · 2017-07-31T17:15:51Z

As mentioned in discussion in #3288, it's questionable only catching instead of throwing exceptions in ChannelExecutor. Should we throw regardless or should we only log in production and throw in test evn?

The text was updated successfully, but these errors were encountered:

dapengzhang0 · 2017-07-31T18:07:25Z

@zhangkun83 , @ejona86

ejona86 · 2017-07-31T21:51:38Z

I do think it tends to make sense to catch and log the exception, since propagating the exception will probably only cause additional failures and we know the current stack does not depend on the result of the call (since the execution is not guaranteed to be complete when drain returns.)

I don't think we should throw in a test environment; that's likely to produce confusion. Instead, you can inject a "exception handler" that is passed the exception. Normally, it would simply log. In tests it could save the exception and we can add an @After to assert that no exception was thrown. None of the tests should trigger the exception code path (except maybe a test that is testing the exception code path :) ).

zhangkun83 · 2017-09-11T18:28:28Z

Instead, you can inject a "exception handler" that is passed the exception

Not sure if you meant injecting the exception handler to ChannelExecutor, but I am just thinking about that. ManagedChannelImpl, when capturing a throwable, may switch to an "permanent error mode" which would fail subsequent RPCs, which propagate the error very loudly.

carl-mastrangelo · 2017-09-11T19:01:07Z

FYI, I have been looking at consolidateing our executors (we have many) including the channel exec. If you make a change before I send out mine, please cc me.

ejona86 · 2017-09-14T16:43:23Z

I'm fine with going into a "permanent error mode" if we get an exception with ChannelExecutor. Although the implementation will need to be very simple.

ejona86 · 2017-11-08T16:11:23Z

@zhangkun83, assigning this to you since it seemed you were really interested in getting this done; feel free to unassign/reassign.

It is now discovering legitimate errors in some tests that need to be resolved first. Follow progress at grpc#3293.

ejona86 · 2018-02-27T21:50:07Z

#4152 disabled the added panic mode. It doesn't delete it, but leaves it non-functional.

We discovered pre-existing tests that had failures cause by exitIdleMode scheduling a timer, which can trigger Thread creation. The timer is to schedule IDLE_TIMEOUT, which is enabled by default to 30 minutes. In certain test environments, Thread creation is black-listed and throws an exception. Previously the exception was caught and logged, but otherwise things worked fine for the test; the test would not last long enough such that the broken IDLE_TIMEOUT would matter.

I thus disabled panic mode so that we can fix up those tests before re-enabling. These tests would all be using InProcessTrasport, and since the recent #4139 it is now possible to pass your own ScheduledExecutorService. These tests could be fixed by passing a fake service of some sort, like TestingExecutors.noOpScheduledExecutor() in Guava's testlib.

It is now discovering legitimate errors in some tests that need to be resolved first. Follow progress at #3293.

dapengzhang0 mentioned this issue Jul 31, 2017

core: fix ConnectivityStateManager is already disabled bug #3288

Merged

zhangkun83 mentioned this issue Sep 11, 2017

grpclb: fallback to backend addresses #3439

Merged

zhangkun83 mentioned this issue Sep 19, 2017

FutureStub hangs when wrong netty version is used #2976

Closed

ejona86 changed the title ~~More thoughts on how ChannelExecutor should handle exceptions~~ Go to "permanent error mode" if ChannelExecutor throws exception Nov 8, 2017

ejona86 added this to the Next milestone Nov 8, 2017

ejona86 assigned zhangkun83 Nov 8, 2017

zhangkun83 mentioned this issue Jan 31, 2018

core: add panic mode for ManagedChannelImpl #4023

Merged

zhangkun83 closed this as completed in #4023 Feb 22, 2018

ejona86 modified the milestones: Next, 1.11 Feb 22, 2018

ejona86 added a commit to ejona86/grpc-java that referenced this issue Feb 27, 2018

core: Disable panic mode

4a5b8d6

It is now discovering legitimate errors in some tests that need to be resolved first. Follow progress at grpc#3293.

ejona86 mentioned this issue Feb 27, 2018

core: Disable panic mode #4152

Merged

ejona86 reopened this Feb 27, 2018

ejona86 added a commit that referenced this issue Feb 27, 2018

core: Disable panic mode

1b5aadf

It is now discovering legitimate errors in some tests that need to be resolved first. Follow progress at #3293.

ejona86 modified the milestones: 1.11, 1.12 Mar 13, 2018

zhangkun83 mentioned this issue Mar 20, 2018

core: re-enable panic mode after all affected internal tests have been fixed. #4245

Merged

zhangkun83 closed this as completed in #4245 Mar 20, 2018

lock bot locked as resolved and limited conversation to collaborators Sep 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Go to "permanent error mode" if ChannelExecutor throws exception #3293

Go to "permanent error mode" if ChannelExecutor throws exception #3293

dapengzhang0 commented Jul 31, 2017

dapengzhang0 commented Jul 31, 2017

ejona86 commented Jul 31, 2017

zhangkun83 commented Sep 11, 2017

carl-mastrangelo commented Sep 11, 2017

ejona86 commented Sep 14, 2017

ejona86 commented Nov 8, 2017

ejona86 commented Feb 27, 2018

Go to "permanent error mode" if ChannelExecutor throws exception #3293

Go to "permanent error mode" if ChannelExecutor throws exception #3293

Comments

dapengzhang0 commented Jul 31, 2017

dapengzhang0 commented Jul 31, 2017

ejona86 commented Jul 31, 2017

zhangkun83 commented Sep 11, 2017

carl-mastrangelo commented Sep 11, 2017

ejona86 commented Sep 14, 2017

ejona86 commented Nov 8, 2017

ejona86 commented Feb 27, 2018