Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent UNIMPLEMENTED error from ambassador #473

Closed
ryandawsonuk opened this issue Mar 26, 2019 · 12 comments
Closed

intermittent UNIMPLEMENTED error from ambassador #473

ryandawsonuk opened this issue Mar 26, 2019 · 12 comments
Assignees
Labels
Ambassador gRPC lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Mar 26, 2019

Spotted this in the helm_examples notebook but have also seen in the E2E tests. Was seeing:

_Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNIMPLEMENTED
    details = ""
    debug_error_string = "{"created":"@1553618148.143629760","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"","grpc_status":12}"

Might especially affect the A/B test scenario but not sure. Was encountered when testing #445 but it isn't related to that change as has been seen separately.

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Mar 28, 2019

Seems to especially affect first request so could be related to emissary-ingress/emissary#504
Also related to grpc/grpc#16515 as 503 is unavailable and it seems this is the behaviour for a network connectivity failure.

Looking at grpc/grpc#16515 (comment) perhaps the way forward for this is https://pypi.org/project/retrying/

@ryandawsonuk
Copy link
Contributor Author

Seem to be able to resolve this for tests using 656ac5b#diff-3119fc203a07a67876b8d74d342faceeR44 - have only applied that in a particular branch for now

@ryandawsonuk
Copy link
Contributor Author

I think upgrading to latest ambassador (through #480) also resolves this for the notebooks. I was seeing this intermittently in notebooks and haven't seen it in that branch with the latest ambassador

@ryandawsonuk
Copy link
Contributor Author

Turns out the issue is present in 0.53.1 but can be worked around by setting ambassador to run as root.

Will check again in 0.60.0 - emissary-ingress/emissary#504

@ryandawsonuk
Copy link
Contributor Author

ryandawsonuk commented Apr 10, 2019

Actually issue still present even when running as root. I managed to recreate the ambassador grpc problem from the helm_examples.ipynb. This time though I didn't uninstall and reinstall ambassador. I just followed the notebook and for the AB test scenario (at which point ambassador has been there for a while) I saw was the grpc calls reporting as failed for a good minute while REST requests worked. It seems the issue actually affects the initial period when a new instance comes online, specifically for A/B tests. Oddly, ambassador logs were reporting that the requests were going through successfully and there were no errors in other logs but the response was clearly a failure:

image

Eventually grpc requests start working but it can take a whole minute. The failures always seems to be when using ambassador with single namespace scope.

We can work around this by downgrading to 0.50.0, even though that means putting references to the ambassador API back down to v0.

@damitkwr
Copy link

damitkwr commented Aug 2, 2019

Is this issue still present in Ambassador v73.0?

@ukclivecox
Copy link
Contributor

Its not been confirmed or closed by Ambassador : see emissary-ingress/emissary#1587

@axsaucedo
Copy link
Contributor

+1

@lennon310
Copy link
Contributor

lennon310 commented Aug 14, 2019

We are using ambassador 0.73.0 (quay.io/datawire/ambassador:0.73.0), and we are having this gRPC issue. May I have an update on how to resolving this?

@ukclivecox ukclivecox added this to the 1.0.x milestone Aug 23, 2019
@ukclivecox
Copy link
Contributor

We need to determine with Ambassador community if this is a Seldon issue or Ambasador issue. If Ambassador we need to evaluate if we continue to support Ambassador long term.

@seldondev
Copy link
Collaborator

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@seldondev seldondev added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 17, 2020
@ukclivecox
Copy link
Contributor

This issue should be fixed for grpc. There is still the issue of envoy updates not always happening immediately in an Ambassador update which means e2e tests can be flaky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ambassador gRPC lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants