-
Notifications
You must be signed in to change notification settings - Fork 238
Treat ErrPolicyForbidden and ErrPodNotFound as permanent errors during retry backoff #250
Conversation
Without this change, fetchCredentials continues to retry GetPodCredentials until it times out, and it's unlikely that the pod will magically appear with the given IP or that the policy will change before the time out comes.
We can't compare error values by pointer as we tried in the previous commit, because the error value we get from GetCredentials is one that was serialized over the wire and placed somewhere different in memory. Currently ErrPolicyForbidden and ErrPodNotFound correspond to "Unknown" errors in gRPC, so our only recourse for identifying these errors on the client side is comparing them by error message
This way if someone calls fetchCredentials and wants to respond to the error values on a case-by-case basis, they don't have to go through the whole rigmarole Also, fix a copy/paste typo I made in the previous commit =(
On a side note, I would have preferred to avoid string comparisons for the error messages, but both of these error types are currently mapped to "Unknown" at the gRPC layer. Changing those would be more involved and potentially disruptive to other users of KIAM, but if you think it would be worth doing, please let me know and I can submit another PR with that change as well. |
It's a 👍 from me, @uswitch/cloud? |
Actually, having just read... I'd support this change for My rationale would be that there's a chance the cache/watcher will be behind when the pod requests credentials- causing Any objection @hoelzro? |
Feedback from Paul Ingles at #250 (comment): > Actually, having just read... I'd support this change for ErrPolicyForbidden but not ErrPodNotFound. > My rationale would be that there's a chance the cache/watcher will be behind when the pod requests > credentials- causing ErrPodNotFound to be intermittent. I think it'd be better for the agent to keep > retrying up to the SDK disconnection.
@pingles Yeah, that makes sense to me! I've added a commit removing permanent error status for |
Thanks! Could you try and find a way to add a test too please- it's a little verbose but should be relatively easy to do in https://github.com/uswitch/kiam/blob/master/pkg/aws/metadata/handler_credentials_test.go |
fetchCredentials shouldn't need to know that interaction with the server is happening over gRPC
Test added! There are a few things that came to mind while writing the test:
|
Sorry it's taken me ages to take a look; I've been stacked at work. I'll see if I can get a chance to review next week unless one of @uswitch/cloud beat me to it 😄 |
@pingles No worries - please ping me when you have a chance to review it! |
@pingles Have you or @uswitch/cloud had a chance to look at this? |
I'm really sorry I haven't. It looks ok to me, @tombooth / @Joseph-Irving could you take a look too please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pingles @Joseph-Irving Thanks! |
If a pod requests credentials it's not allowed to, kiam-agent will keep retrying the same request until the five second timeout is reached. This changeset treats ErrPolicyForbidden (as well as ErrPodNotFound) as permanent errors, so that the AWS client talking to kiam-agent gets a response in a more timely fashion.