Fred/eng 101 argo rollout support #303

linkous8 · 2021-07-29T20:52:21Z

Update kubernetes_asyncio dependency to custom fork that fixes issue with patching of custom resource
Add models for parsing rollout custom resource objects from camel case json to snake case to conform with kubernetes python client models
Add Rollout class to kubernetes connector for working with argo rollouts
Update CanaryOptimization class to allow it to work with an argo rollout
- Update Deployment class with new methods introduced for working with rollouts
Refactor opsani_dev checks to be controller agnostic and add rollout support
Update cli sidecar injection validation to allow argo rollout as a target
Add tests and manifests for new rollout support
Reorder params and rename some existing tests to reduce integration test flakiness caused by kubetest using same namespace/cluster role binding names for multiple tests

- Update cli inject_sidecar cmd to support rollout target - Add pydantic models with alias generator for converting camel case custom resource dict to snake case objects - Add Rollout KubernetesModel for interacting with rollouts - Update CanaryOptimization to support rollout target - Update KubernetesOptimizations to use configured rollouts - Add KubernetesConfiguration for rollouts - Update opsani_dev connector and checks to support rollouts - Add Kubernetes integration tests for rollouts - Add opsani_dev integration tests for rollouts - Add rollout manifests for tests

for rollouts for ease of use - update kubernetes_asyncio dependency to use patched fork - add filelock dependency for preventing xdist worker collision - Fix rollout models to allow population by field name - Fix opsani_dev resource requirement check to work with rollouts - Refactor rollout setup fixtures to eliminate redundant code - Add locking to rollout CRD application to prevent xdist collisions - Fix types used for rollout sidecar injection test - Fix opsani_dev test_generate unit test - Add resource requirements tests for rollouts - Add rollout manifests for resource requirements tests - Specify container port protocol explicitlty in rollout manifests as they are not autopopulated by the rollout CR (however, pods do autopopulate values if not contained in the rollout) - add missing label selector in test manifest opsani_dev/argo_rollouts/rollout.yaml - cleanup whitespace #test:integration

- opsani_dev generate_kubernetes_config update to wrap deployment/rollout configs in a list - update opsani_dev_test remedy step 7 for refactored canary code #test:integration

- Rollout model make revision_history_limit optional - remove kube dependency from test_generate_rollout_config unit test

in opsani_dev

forward to rollout

of config values

- Introduce get_tuning_container, get_tuning_pod_template_spec, and update_tuning_pod methods to deployment and rollout to reduce spaghetti code - Add FakeKubeResponse class and default_kubernetes_json_serializer to allow use of public api_client.deserialize method - Add rollout integration test coverage to check service adopts tuning

- Update checks logic to not run required checks when exclusive is true - Update rollout mdoel optional fields - Add rollout match_labels property - update return type of rollout pod_template_spec property - refactor OpsaniDevChecks into controller agnostic abstract base class BaseOpsaniDevChecks - Define OpsaniDevChecks class for checks against a Deployment controller - Define OpsaniDevRolloutChecks for checks against a Rollout controller - Add new Rollout check for opsani_role selector and label - Update opsani_dev_test for refactored checks - Add opsani_dev_test coverage for opsani_role check - Expand opsani_dev_test change_to_resource context manager to support rollout

linear · 2021-07-29T20:56:13Z

ENG-101 Argo rollout support

Per @fred from Slack

Rollouts are like standard k8s deployments with extra features that we don't really interact with. The main concern is extracting the pod template spec for creation of the tuning pod at which point it becomes just a standard pod. Take a look at the pydantic models I define here (will have to click "Files Changed" -> "load diff" before it will highlight the right line): https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2080Here is where the models get converted into the standard k8s tuning pod: https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2536Quite a bit has changed with the servox code but the gist is to take the models defined there and build out the needed Rollout class so that servox can use it in the same way as the standard deployment. Why don't you take some time to review the models and code diff and then we can hop on a call to answer any questions you might have. Sound good?

@peter provided additional clarification:

I think the minimal amount of work that we need is to support tuning instance for argo rollouts. This means we just need to be able to read the pod spec from argo rollouts instead of deployment. At least in the urgent track, we don't need to be able to actually adjust the rollout itself (this is promotion or adjustment in saturation mode). This literally should be:

Allow specifying 'rollout' attribute and name as a target (instead of 'deployment'); support that both in the opsani_dev and the kubernetes connectors
In the kubernetes connector, handle rollouts as a target of getting info (pod spec), which just happens to be in the same place where it is in a Deployment object (top-level attribute, same exact structure)
There may be other places in the opsani_dev connector that need to work with rollout vs deployment (during checks/remedies)

There is some urgency to have the above (ie support Argo Rollouts as a target but without the ability to adjust it (just read from it and have tuning instance created - which tuning instance is a standalone pod, just like when we do deployments). Similarly, there may be some tweaks in the metric labels (hopefully not).
Hence I propose to split the implementation into two parts: (a) support just tuning instance for rollouts (as described above); test and release or keep in a custom branch; and (b) full support for rollouts where we can even make adjustments. (B) is the ultimate goal but (a) can help us if it can be done earlier without disrupting the plan too much

(Lifemiles is working with Argo Rollouts)

- reorder params of injection tests so they generate differnt kubetest namespaces - rename test_inject_by_source_port_name_with_symbolic_target_port test to prevent kubetest namespace collision

- Update rollout selector check to remove automated remedy and add warning for orphaned replicasets - Rename rollout kubernetes integration tests to reduce kubetest crb name collisions - Update opsani_dev rollout selector check test to remove validation of dropped remedy

reduce kubetest cluster role binding name collisions

remove unused import of tests.helpers in opsani_dev_test

blakewatters

Comments inline but additionally the code change to test ratio seems out of alignment. There are touches all over the place and a bunch of new classes but it is all backed with a few integration tests. I didn't go and enumerate the cases but I am happy to try and map it out with you.

blakewatters · 2021-08-09T16:51:18Z

pyproject.toml

@@ -37,6 +36,7 @@ toml = "^0.10.2"
 colorama = "^0.4.4"
 pyfiglet = "^0.8.post1"
 curlify2 = "^1.0.0"
+kubernetes-asyncio = {git = "https://github.com/opsani/kubernetes_asyncio", rev = "v12.1.2-custom-resource-patch-fix"}


Have we sent this back upstream as a PR? Starting to carry around our own custom fork of the Kubernetes library is sorta heavy. Knowing if we can get a patch landed and get back on a release is significant given how foundational this library is

The patch I'm using in the fork is not upstreamable as it was applied to the generated code rather than updating the code generation models. The upstream library maintainer has PRed a fix for this issue into the standard kubernetes python client but its unclear when that will be landed and the fork is mainly a temporary workaround.

The library does not release very frequently so keeping up with patching the fork should not be too burdensome and I've also thrown together some automation on the fork to let us know when we need to rebase it: https://github.com/opsani/kubernetes_asyncio/blob/master/.github/workflows/check-upstream-release.yaml

See ENG-148 for discussion pertaining to this workaround

blakewatters · 2021-08-09T16:56:00Z