Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activator to perform health checks before forwarding real requests #2856

Closed
tcnghia opened this issue Jan 4, 2019 · 4 comments · Fixed by #3257
Closed

Activator to perform health checks before forwarding real requests #2856

tcnghia opened this issue Jan 4, 2019 · 4 comments · Fixed by #3257
Labels
area/autoscale area/networking kind/feature Well-understood/specified features, ready for coding.
Milestone

Comments

@tcnghia
Copy link
Contributor

tcnghia commented Jan 4, 2019

Expected Behavior

We can handle POST request without needing to retries them. We minimize the unneeded GET retries.

If activator performs health check before forwarding requests we could potentially fix the POST attempt to one, and also reduce a lot of GET retries (of real requests).

Actual Behavior

Currently activator doesn't perform health checks before forwarding requests. This means we have to retry for POST, and we perform unnecessary GETs before the Revision is handling traffic.

Steps to Reproduce the Problem

Additional Info

@knative-prow-robot knative-prow-robot added area/autoscale area/networking kind/feature Well-understood/specified features, ready for coding. labels Jan 4, 2019
@tcnghia
Copy link
Contributor Author

tcnghia commented Jan 5, 2019

This may be done in conjunction of queue-proxy since activator no longer reads from Revision.

@vvraskin
Copy link
Contributor

vvraskin commented Jan 5, 2019

Referring to the Get retries, do you mean this peace of code? https://github.com/knative/serving/blob/master/pkg/activator/revision.go
It will be replaced by Revision informer soonish as part of #1573

@markusthoemmes
Copy link
Contributor

@vvraskin this is about retrying GET HTTP requests to the user application, because we wait for network programming. It's not about retries towards the kubernetes API.

@mattmoor
Copy link
Member

mattmoor commented Feb 7, 2019

This feels like it fits with the class of probing we're doing for positive handoff.

mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: knative#2856, knative#2849, knative#3239
knative-prow-robot pushed a commit that referenced this issue Feb 17, 2019
This adds a header `k-network-probe`, to which the Knative networking elements
respond without forwarding the requests.  They also identify themselves in their
response, so that we know what component is handling the probe.

This is related to: #2856, #2849, #3239
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 17, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Unit testing for the get probes
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 18, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 18, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Still TODO:
 - Disable by default, and `t.Skip()` the streaming GRPC test

These will be `Fixes:` when this is enabled by default.
Related: knative#3239
Related: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 19, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
mattmoor added a commit to mattmoor/serving that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on knative#3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: knative#3239
Fixes: knative#2856
knative-prow-robot pushed a commit that referenced this issue Feb 22, 2019
When the flag `-enable-network-probing` is passed (on by default) the activator will replace its retring transport logic with a simple network probe based on #3256 with a similar number of retries to what the retrying transport was previously configured to use.  Enabling this allows the GRPC test with streaming and cold-start fairly reliably on my cluster (and also with the GRPC ping sample in knative/docs, with my fixes).

This change also refactors the GRPC test into 4 tests, for each of the logical things tested, which will hopefully reduce the amount of time this adds to e2e dramatically when we switch to use `t.Parallel()` since it will parallelize the two times this waits for a scale-to-zero.

Fixes: #3239
Fixes: #2856
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale area/networking kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants