-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fred/eng 101 argo rollout support #303
Conversation
linkous8
commented
Jul 29, 2021
•
edited
Loading
edited
- Update kubernetes_asyncio dependency to custom fork that fixes issue with patching of custom resource
- Add models for parsing rollout custom resource objects from camel case json to snake case to conform with kubernetes python client models
- Add Rollout class to kubernetes connector for working with argo rollouts
- Update CanaryOptimization class to allow it to work with an argo rollout
- Update Deployment class with new methods introduced for working with rollouts
- Refactor opsani_dev checks to be controller agnostic and add rollout support
- Update cli sidecar injection validation to allow argo rollout as a target
- Add tests and manifests for new rollout support
- Reorder params and rename some existing tests to reduce integration test flakiness caused by kubetest using same namespace/cluster role binding names for multiple tests
- Update cli inject_sidecar cmd to support rollout target - Add pydantic models with alias generator for converting camel case custom resource dict to snake case objects - Add Rollout KubernetesModel for interacting with rollouts - Update CanaryOptimization to support rollout target - Update KubernetesOptimizations to use configured rollouts - Add KubernetesConfiguration for rollouts - Update opsani_dev connector and checks to support rollouts - Add Kubernetes integration tests for rollouts - Add opsani_dev integration tests for rollouts - Add rollout manifests for tests
for rollouts for ease of use - update kubernetes_asyncio dependency to use patched fork - add filelock dependency for preventing xdist worker collision - Fix rollout models to allow population by field name - Fix opsani_dev resource requirement check to work with rollouts - Refactor rollout setup fixtures to eliminate redundant code - Add locking to rollout CRD application to prevent xdist collisions - Fix types used for rollout sidecar injection test - Fix opsani_dev test_generate unit test - Add resource requirements tests for rollouts - Add rollout manifests for resource requirements tests - Specify container port protocol explicitlty in rollout manifests as they are not autopopulated by the rollout CR (however, pods do autopopulate values if not contained in the rollout) - add missing label selector in test manifest opsani_dev/argo_rollouts/rollout.yaml - cleanup whitespace #test:integration
- opsani_dev generate_kubernetes_config update to wrap deployment/rollout configs in a list - update opsani_dev_test remedy step 7 for refactored canary code #test:integration
- Rollout model make revision_history_limit optional - remove kube dependency from test_generate_rollout_config unit test
in opsani_dev
forward to rollout
of config values
- Introduce get_tuning_container, get_tuning_pod_template_spec, and update_tuning_pod methods to deployment and rollout to reduce spaghetti code - Add FakeKubeResponse class and default_kubernetes_json_serializer to allow use of public api_client.deserialize method - Add rollout integration test coverage to check service adopts tuning
- Update checks logic to not run required checks when exclusive is true - Update rollout mdoel optional fields - Add rollout match_labels property - update return type of rollout pod_template_spec property - refactor OpsaniDevChecks into controller agnostic abstract base class BaseOpsaniDevChecks - Define OpsaniDevChecks class for checks against a Deployment controller - Define OpsaniDevRolloutChecks for checks against a Rollout controller - Add new Rollout check for opsani_role selector and label - Update opsani_dev_test for refactored checks - Add opsani_dev_test coverage for opsani_role check - Expand opsani_dev_test change_to_resource context manager to support rollout
ENG-101 Argo rollout support
Per @fred from Slack Rollouts are like standard k8s deployments with extra features that we don't really interact with. The main concern is extracting the pod template spec for creation of the tuning pod at which point it becomes just a standard pod. Take a look at the pydantic models I define here (will have to click "Files Changed" -> "load diff" before it will highlight the right line): https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2080Here is where the models get converted into the standard k8s tuning pod: https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2536Quite a bit has changed with the servox code but the gist is to take the models defined there and build out the needed Rollout class so that servox can use it in the same way as the standard deployment. Why don't you take some time to review the models and code diff and then we can hop on a call to answer any questions you might have. Sound good? @peter provided additional clarification: I think the minimal amount of work that we need is to support tuning instance for argo rollouts. This means we just need to be able to read the pod spec from argo rollouts instead of deployment. At least in the urgent track, we don't need to be able to actually adjust the rollout itself (this is promotion or adjustment in saturation mode). This literally should be:
There is some urgency to have the above (ie support Argo Rollouts as a target but without the ability to adjust it (just read from it and have tuning instance created - which tuning instance is a standalone pod, just like when we do deployments). Similarly, there may be some tweaks in the metric labels (hopefully not).
|
- reorder params of injection tests so they generate differnt kubetest namespaces - rename test_inject_by_source_port_name_with_symbolic_target_port test to prevent kubetest namespace collision
- Update rollout selector check to remove automated remedy and add warning for orphaned replicasets - Rename rollout kubernetes integration tests to reduce kubetest crb name collisions - Update opsani_dev rollout selector check test to remove validation of dropped remedy
reduce kubetest cluster role binding name collisions
remove unused import of tests.helpers in opsani_dev_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments inline but additionally the code change to test ratio seems out of alignment. There are touches all over the place and a bunch of new classes but it is all backed with a few integration tests. I didn't go and enumerate the cases but I am happy to try and map it out with you.
@@ -37,6 +36,7 @@ toml = "^0.10.2" | |||
colorama = "^0.4.4" | |||
pyfiglet = "^0.8.post1" | |||
curlify2 = "^1.0.0" | |||
kubernetes-asyncio = {git = "https://github.com/opsani/kubernetes_asyncio", rev = "v12.1.2-custom-resource-patch-fix"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we sent this back upstream as a PR? Starting to carry around our own custom fork of the Kubernetes library is sorta heavy. Knowing if we can get a patch landed and get back on a release is significant given how foundational this library is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch I'm using in the fork is not upstreamable as it was applied to the generated code rather than updating the code generation models. The upstream library maintainer has PRed a fix for this issue into the standard kubernetes python client but its unclear when that will be landed and the fork is mainly a temporary workaround.
The library does not release very frequently so keeping up with patching the fork should not be too burdensome and I've also thrown together some automation on the fork to let us know when we need to rebase it: https://github.com/opsani/kubernetes_asyncio/blob/master/.github/workflows/check-upstream-release.yaml
See ENG-148 for discussion pertaining to this workaround
class FakeKubeResponse: | ||
"""Mocks the RESTResponse object as a workaround for kubernetes python api_client deserialization""" | ||
def __init__(self, obj): | ||
self.data = json.dumps(obj, default=default_kubernetes_json_serializer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff should live in the tests module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of a workaround to allow the use of the api_client.deserialize()
public method instead of the private method api_client._ApiClient__deserialize()
. See kubernetes-client/python#977 (comment) for more context and LMK if we should drop the Response faking and just go with the private method instead
servo/connectors/kubernetes.py
Outdated
# Use alias generator so that lower camel case can be parsed to snake case properties to match k8s python client behaviour | ||
def to_lower_camel(string: str) -> str: | ||
split = string.split('_') | ||
return split[0] + ''.join(word.capitalize() for word in split[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prolly should live up in the types module and needs some unit tests. The naming seems unintuitive as it assumes you have a string delimited by an underscore. The comment is perplexing as well. How can you have a lower camel case
string? Isn't camel-case defined entirely by having a mix of upper and lower case characters delimiting the boundary of the components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- WRT the method name, the input for this method isn't the field name of the kubernetes object, rather its the fields of the pydantic model. Eg. deletion_grace_period_seconds -> deletionGracePeriodSeconds
- Lower camel case seemed to be the industry accepted term for having the first word be lower case followed by standard camel case but I will make an update to clarify this with the more formal naming of "dromedary case"
# NOTE/TODO: fields typed with Any should maintain the same form when dumped as when they are parsed. Should the need | ||
# arise to interact with such fields, they will need to have an explicit type defined so the alias_generator is applied | ||
class RolloutV1LabelSelector(RolloutBaseModel): # must type out k8s models as well to allow parse_obj to work | ||
match_expressions: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do better than this on the type identifier, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env_from: Any | ||
image: str | ||
image_pull_policy: Optional[str] | ||
lifecycle: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother to even have these fields if you aren't going to type them? We can just use the Config
to ignore unspecified fields if you don't want to build them out but specifying fields as Any
isn't helpful and effectively backslides into turning the data model into a glorified dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no need to interact with these fields in the context of the connector but can't exclude them from the model because they would not be included in copying the rollout template spec into a standalone pod. By leaving the type as Any, we can effectively say we don't care about the typing of these fields because they came from an already validated Kubernetes object and will only ever be used to populate a new Kubernetes object.
I don't see a problem with the model being a glorified dictionary in this context since we are only interested in passing through the data from reads to any write operations and these models are meant to be temporary anyway until the generated models can be implemented.
LMK If you still disagree with the justification above, and I can expand the models to include fields we currently specify as Any.
"port, service", | ||
[ | ||
('fiber-http', None), | ||
('fiber-http', 80), | ||
('fiber-http', 'http'), | ||
(None, 'fiber-http'), | ||
(80, 'fiber-http'), | ||
('http', 'fiber-http'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why'd you flip this? It seems more intuitive to me as service then port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves some of the flakiness in integration tests due to kubetest namespace collisions. What happens is that the parameter is appended to the test name then truncated to 43 characters and prepended with kubetest and appended with a timestamp. The result namespace name is kubetest-test-inject-single-port-deployment-fiber-ht-
which is the same for all 3 parameterizations which causes conflicts when each of the test goes to create the namespace. You can see an example of this failure in this test run: https://github.com/opsani/servox/runs/3204520239
By reordering the parameters, each test parameterization produces a unique namespace name which prevents issues from tests trying to use the same namespace and/or tearing down the namespace set up by another test. eg.
kubetest-test-inject-single-port-deployment-none-fib-
kubetest-test-inject-single-port-deployment-80-fiber-
kubetest-test-inject-single-port-deployment-http-fib-
- Drop "tuning" from methods with no direct tie to the concept of tuning - Rename alias generator to reduce ambiguity by indicating dromedary case - Update hpa_replicas field to be idiomatic - Move rollout const args to private static class field - Update rollout api_clients dict to use rollout constants - Drop rollout match_labels prop to maintain consistency with deployment - Refactor CanaryOptimization to remove overloading of deployment with rollout, add target_controller* getters for DRYness - Add validator to CanaryOptimziation so rollout and deployment are mutually exclusive - Rename CanaryOptimziation create var/arg: deployment_config -> deployment_or_rollout_config deployment -> deployment_or_rollout - Rename KubernetesOptimizations create var deployment_config -> deployment_or_rollout_config - Update KubernetesOptimizations create to use nested/cascaded timeout config - Update KubernetesConfiguration to make deployments field optional (also updated KubernetesOptimizations create) - Update cascade_common_settings to ignore None fields - Update opsani_dev generate_kubernetes_config to remove empty deployments collection from rollout config - Add checks helper _get_controller_service_selector to maintain abstraction between deployments and rollouts - Add rollout checks validation that pod-template-hash is set on status - Update k8s rollout integration test: set config deployments to None, whitespace nit
- Refactor KubernetesChecks to support argo rollouts - Fix k8s integration tests failing from timeouts not being cascaded - Fix opsani dev tests CanaryOptimization.create using wrong arg name
- Update config cascade logging to indicate field is set, not unset - Update config cascade logging to log value of set field - Specify overwrite True in test config cascade so default timeout is overriden
test_adjust_deployment_never_ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM