Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong assumptions and anti-patterns in InstanceController #422

Closed
gkleiman opened this issue Jun 26, 2019 · 5 comments · Fixed by #850
Closed

Wrong assumptions and anti-patterns in InstanceController #422

gkleiman opened this issue Jun 26, 2019 · 5 comments · Fixed by #850
Assignees
Labels
Milestone

Comments

@gkleiman
Copy link
Contributor

gkleiman commented Jun 26, 2019

While trying to address a TODO left in the InstanceController (#380) I found a few issues:

  1. The current implementation of the watch predicate function assumes that it will always receive both the old and the updated version of an instance. Having read the Kubernetes documentation, this assumption seems to be incorrect:

A given Kubernetes server will only preserve a historical list of changes for a limited time. Clusters using etcd3 preserve changes in the last 5 minutes by default. When the requested watch operations fail because the historical version of that resource is not available, clients must handle the case by recognizing the status code 410 Gone, clearing their local cache, performing a list operation, and starting the watch from the resourceVersion returned by that new list operation.

  1. The predicate function reads the instance (controller's root object) status and may have side-effects (i.e., create a PlanExecution). Going through the Kubebuilder tutorial, I saw that they strongly discourage operators from doing that:

Remember, status should be able to be reconstituted from the state of the world, so it's generally not a good idea to read from the status of the root object. Instead, you should reconstruct it every run.

I think that we could address both issues by:

  1. Storing parameters in PlanExecution.
  2. Making the instances watcher in InstanceController use the default predicate.
  3. Moving all the logic from the current predicate to Reconcile. The new implementation of Reconcile would then reconstitute state of the world from the latest PlanExecution, eliminating the need to read the status field or to compare the current version of an instance with an older one.

Note that these changes would most likely break backwards compatibility, so I think we should work on this ASAP.

@gerred
Copy link
Member

gerred commented Jun 26, 2019

Yep. Let's talk about this in community meeting. 1 we have definitely waned to address for a while for reasons unrelated to this, so that's great. I'm thinking this could start this next release cycle for v0.3.1 or v0.4.0.

Is there an e2e/intregration test that could make this case fail today just so we have a bar to work against?

@gkleiman
Copy link
Contributor Author

A test following this scenario would probably lead to incorrect behaviour today:

  1. Start k8s and KUDO
  2. Add an Instance and wait for the PlanExecution to complete
  3. Kill KUDO
  4. Modify the instance updating a parameter with a non-default trigger
  5. Manually trigger an etcd compaction, e.g., via etcdctl compact <resourceVersion>
  6. Restart KUDO, who will probably erroneously act as if no parameter had been changed

gkleiman added a commit that referenced this issue Jul 9, 2019
NOTE: The update predicate function still inspects the status attribute
and has side-effects. This is an anti-pattern that should be solved in
a different patch, see #422.
gerred pushed a commit that referenced this issue Jul 10, 2019
NOTE: The update predicate function still inspects the status attribute
and has side-effects. This is an anti-pattern that should be solved in
a different patch, see #422.
@kensipe kensipe added this to the 0.5.0 milestone Jul 26, 2019
@kensipe
Copy link
Member

kensipe commented Jul 26, 2019

This is a critical issue... scheduling this fix for a 0.4.x bug fix release or for 0.5.0.

@alenkacz
Copy link
Contributor

We started working on this with @kensipe

@alenkacz alenkacz modified the milestones: 0.5.0, 0.6.0 Jul 31, 2019
@alenkacz
Copy link
Contributor

alenkacz commented Aug 6, 2019

From the docs: OnUpdate is called when an object is modified. Note that oldObj is the last known state of the object-- it is possible that several changes were combined together, so you can't use this to see every single change.

@alenkacz alenkacz modified the milestones: 0.6.0, 0.7.0 Sep 10, 2019
alenkacz added a commit that referenced this issue Sep 30, 2019
**What this PR does / why we need it**:
This is an implementation of controller based on KEP-18.

Some highlights:
- start reviewing with the Reconcile function https://github.com/kudobuilder/kudo/pull/850/files#diff-45e4638e2c718ccd5d08635baec770b1R98
- plan execution CRD, controller and everything around that was removed
- with this PR we temporarily don't really support custom plan execution and for the correct plan execution we need the admission webhook which will come also in another PR

Fixes #422
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants