Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch the GEP-713 parable to refer to Chihiro. #3443

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 21 additions & 21 deletions geps/gep-713/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,10 +277,10 @@ past two and a half weeks, but after successfully deploying version 3.6.0 of
the `baker` service this morning, she's escaped early to try to unwind a bit.

Her shoulders are just starting to unknot when her phone pings with a text
from Charlie, down in the NOC. Waterproof phones are a blessing, but also a
from Chihiro, down in the NOC. Waterproof phones are a blessing, but also a
curse.

**Charlie**: _Hey Ana. Things are still running, more or less, but latencies
**Chihiro**: _Hey Ana. Things are still running, more or less, but latencies
on everything in the `baker` namespace are crazy high after your last rollout,
and `baker` itself has a weirdly high load. Sorry to interrupt you on the lake
but can you take a look? Thanks!!_
Expand All @@ -297,30 +297,30 @@ duplicates? Ana checks her HTTPRoute again, though she's pretty sure you
can't configure retries there, and finds nothing. But it definitely looks like
clients are retrying when they shouldn’t be.

She pings Charlie.
She pings Chihiro.

**Ana**: _Hey Charlie. Something weird is up, looks like requests to `baker`
**Ana**: _Hey Chihiro. Something weird is up, looks like requests to `baker`
are failing but getting retried??_

A minute later they answer.

**Charlie**: 🤷 _Did you configure retries?_
**Chihiro**: 🤷 _Did you configure retries?_

**Ana**: _Dude. I don’t even know how to._ 😂

**Charlie**: _You just attach a RetryPolicy to your HTTPRoute._
**Chihiro**: _You just attach a RetryPolicy to your HTTPRoute._

**Ana**: _Nope. Definitely didn’t do that._

She types `kubectl get retrypolicy -n baker` and gets a permission error.

**Ana**: _Huh, I actually don’t have permissions for RetryPolicy._ 🤔

**Charlie**: 🤷 _Feels like you should but OK, guess that can’t be it._
**Chihiro**: 🤷 _Feels like you should but OK, guess that can’t be it._

Minutes pass while both look at logs.

**Charlie**: _I’m an idiot. There’s a RetryPolicy for the whole namespace –
**Chihiro**: _I’m an idiot. There’s a RetryPolicy for the whole namespace –
sorry, too many policies in the dashboard and I missed it. Deleting that since
you don’t want retries._

Expand All @@ -332,17 +332,17 @@ through them: there’s one for every single service in the `baker` namespace.

**Ana**: _PUT IT BACK!!_

**Charlie**: _Just did. Be glad you couldn't hear all the alarms here._ 😕
**Chihiro**: _Just did. Be glad you couldn't hear all the alarms here._ 😕

**Ana**: _What the hell just happened??_

**Charlie**: _At a guess, all the workloads in the `baker` namespace actually
**Chihiro**: _At a guess, all the workloads in the `baker` namespace actually
fail a lot, but they seem OK because there are retries across the whole
namespace?_ 🤔

Ana's blood runs cold.

**Charlie**: _Yeah. Looking a little closer, I think your `baker` rollout this
**Chihiro**: _Yeah. Looking a little closer, I think your `baker` rollout this
morning would have failed without those retries._ 😕

There is a pause while Ana's mind races through increasingly unpleasant
Expand All @@ -351,40 +351,40 @@ possibilities.
**Ana**: _I don't even know where to start here. How long did that
RetryPolicy go in? Is it the only thing like it?_

**Charlie**: _Didn’t look closely before deleting it, but I think it said a few
**Chihiro**: _Didn’t look closely before deleting it, but I think it said a few
months ago. And there are lots of different kinds of policy and lots of
individual policies, hang on a minute..._

**Charlie**: _Looks like about 47 for your chunk of the world, a couple hundred
**Chihiro**: _Looks like about 47 for your chunk of the world, a couple hundred
system-wide._

**Ana**: 😱 _Can you tell me what they’re doing for each of our services? I
can’t even_ look _at these things._ 😕

**Charlie**: _That's gonna take awhile. Our tooling to show us which policies
**Chihiro**: _That's gonna take awhile. Our tooling to show us which policies
bind to a given workload doesn't go the other direction._

**Ana**: _...wait. You have to_ build tools _to know if retries are turned on??_

Pause.

**Charlie**: _Policy attachment is more complex than we’d like, yeah._ 😐
_Look, how ‘bout roll back your `baker` change for now? We can get together in
**Chihiro**: _Policy attachment is more complex than we’d like, yeah._ 😐
_Look, how about roll back your `baker` change for now? We can get together in
the morning and start sorting this out._

Ana shakes her head and rolls back her edits to the `baker` Deployment, then
sits looking out over the lake as the deployment progresses.

**Ana**: _Done. Are things happier now?_

**Charlie**: _Looks like, thanks. Reckon you can get back to your sailboard._ 🙂
**Chihiro**: _Looks like, thanks. Reckon you can get back to your sailboard._ 🙂

Ana sighs.

**Ana**: _Wish I could. Wind’s died down, though, and it'll be dark soon.
Just gonna head home._

**Charlie**: _Ouch. Sorry to hear that._ 😐
**Chihiro**: _Ouch. Sorry to hear that._ 😐

One more look out at the lake.

Expand All @@ -401,13 +401,13 @@ listed in increasing order of desirability:
- _Which_ Policy is (or Policies are) affecting a particular object
- _What_ settings in the Policy are affecting the object.

In the parable, if Ana and Charlie had known that there were Policies affecting
In the parable, if Ana and Chihiro had known that there were Policies affecting
the relevant object, then they could have gone looking for the relevant Policies
and things would have played out differently. If they knew which Policies, they
would need to look less hard, and if they knew what the settings being applied
were, then the parable would have been able to be very short indeed.

(There’s also another use case to consider, in that Charlie should have been able
(There’s also another use case to consider, in that Chihiro should have been able
to see that the Policy on the namespace was in use in many places before deleting
it.)

Expand All @@ -432,7 +432,7 @@ ways at an API level to the Application Developer's concerns.

An important note here is that a key piece of information for Policy Admins and
Cluster Operators is “How many things does this Policy affect?”. In the parable,
this would have enabled Charlie to know that deleting the Namespace Policy would
this would have enabled Chihiro to know that deleting the Namespace Policy would
affect many other people than just Ana.

### Problems we need to solve
Expand Down