-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differentiate status codes from functions vs OpenFaaS components #1792
Comments
I think that options 1 and 2 should both be included, and that option 2 should be opt-in for the sake of backwards compatibility. Here's a truth table for describing the cases using only option 1:
As you can see, in this approach you don't know if the response is coming from infrastructure (non-openfaas, non-function. k8s, istio, etc) or from the function. Here's another table using only option 2:
This option only lets you know if the response came from a function. All cases other than the function are effectively unknown as to what the source was. Here's a final table using both:
This approach allows you know the most; it allows you to know if the response came from inside openfaas, the user's function, or some other unknown/infrastructural source. As for whether or not option 1 should be "only when an OF component has an error", I would like to clarify that I don't think that's specifically necessary to call out, as the source of a response should effectively be invisible to the user unless there's an error. In other words, there's no specific reason why the If it matters, you could have a header stating every OF component that "touched" this request along the way, but that's more of a debug feature in my opinion. |
Summarised from Kevin on weekly call:
|
Description
Differentiate status codes from functions vs OpenFaaS components
Why do you need this?
When invoking functions or attempting to retry them, it's currently hard to differentiate between an error caused by Kubernetes (i.e. node eviction during an invocation) or by a function (i.e. a 429 because the Twitter API that was being used is overloaded)
Who is this for?
@kevin-lindsay-1 for Surge has requested this - but Waylay also wanted this for their integration cc @OcamsRazor
Expected Behaviour
A way to determine whether a 500 error was from the gateway / watchdog or from the function
Current Behaviour
There are some hints depending on the message body and headers, however no consistency right now.
List All Possible Solutions and Workarounds
Which Solution Do You Recommend?
I recommend 1 - because 2 depends on the use of the watchdog, which is not used by all users.
For 1 - the gateway needs a change since it can invoke functions directly. The provider also needs a change for when direct_functions is set to false and invocations flow through it instead. The watchdog should also have a change so that if it handles an error that can be passed back up the stream.
Headers do support multiple values for a key, i.e.
X-OpenFaaS-Source: [watchdog, gateway]
For 1 - the watchdog does not need to be updated, and even when it's not in use, this header would still propagate and flow.
Then there's a wider conversation about how the queue-worker should retry errors when the "X-OpenFaaS-Source" header is present - assuming that these need to be retried due to an error during scaling - or node eviction.
The text was updated successfully, but these errors were encountered: