-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gRPC Streaming after Cold Start #3239
Comments
After poking at this for a bit and instrumenting things, I think I'm convinced that this is a problem with the retry logic in the activator. On my cluster, it usually takes 4-5 attempts before it actually connects to the When I added a I think that in general trying to "rewind" the user's request is creating more problems than it's worth[1]. I think that we should probably pursue a simple probe endpoint that is intercepted and filtered by the [1] - An example problem I observed parsing some of this code is that the |
Here's some commentary from Matt Klein on this topic (thanks to @tcnghia for the pointer): envoyproxy/envoy#3594 (comment) I think this basically confirms that we shouldn't be trying to send & retry, but check network readiness by other means (e.g. the probe). |
This adds a header `k-network-probe`, to which the Knative networking elements respond without forwarding the requests. They also identify themselves in their response, so that we know what component is handling the probe. This is related to: knative#2856, knative#2849, knative#3239
This adds a header `k-network-probe`, to which the Knative networking elements respond without forwarding the requests. They also identify themselves in their response, so that we know what component is handling the probe. This is related to: knative#2856, knative#2849, knative#3239
This adds a header `k-network-probe`, to which the Knative networking elements respond without forwarding the requests. They also identify themselves in their response, so that we know what component is handling the probe. This is related to: knative#2856, knative#2849, knative#3239
This adds a header `k-network-probe`, to which the Knative networking elements respond without forwarding the requests. They also identify themselves in their response, so that we know what component is handling the probe. This is related to: knative#2856, knative#2849, knative#3239
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Unit testing for the get probes - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Still TODO: - Disable by default, and `t.Skip()` the streaming GRPC test These will be `Fixes:` when this is enabled by default. Related: knative#3239 Related: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
@mattmoor yeah, the the retrying in the |
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: knative#3239 Fixes: knative#2856
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on #3256 with a similar number of retries to what the retrying transport was previously configured to use. Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes). This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero. Fixes: #3239 Fixes: #2856
In what area(s)?
/area autoscale
/area networking
/area test-and-release
What version of Knative?
Expected Behavior
After a service is scaled-to-zero, a streaming gRPC request should return successfully.
Actual Behavior
After a service is scaled-to-zero, a streaming gRPC request times out.
Steps to Reproduce the Problem
See commented out test case in #3205
/assign
The text was updated successfully, but these errors were encountered: