Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fred/eng 101 argo rollout support #303

Merged
merged 30 commits into from
Aug 13, 2021
Merged

Conversation

linkous8
Copy link
Collaborator

@linkous8 linkous8 commented Jul 29, 2021

  • Update kubernetes_asyncio dependency to custom fork that fixes issue with patching of custom resource
  • Add models for parsing rollout custom resource objects from camel case json to snake case to conform with kubernetes python client models
  • Add Rollout class to kubernetes connector for working with argo rollouts
  • Update CanaryOptimization class to allow it to work with an argo rollout
    • Update Deployment class with new methods introduced for working with rollouts
  • Refactor opsani_dev checks to be controller agnostic and add rollout support
  • Update cli sidecar injection validation to allow argo rollout as a target
  • Add tests and manifests for new rollout support
  • Reorder params and rename some existing tests to reduce integration test flakiness caused by kubetest using same namespace/cluster role binding names for multiple tests

- Update cli inject_sidecar cmd to support rollout target
- Add pydantic models with alias generator for converting camel case
 custom resource dict to snake case objects
- Add Rollout KubernetesModel for interacting with rollouts
- Update CanaryOptimization to support rollout target
- Update KubernetesOptimizations to use configured rollouts
- Add KubernetesConfiguration for rollouts
- Update opsani_dev connector and checks to support rollouts
- Add Kubernetes integration tests for rollouts
- Add opsani_dev integration tests for rollouts
- Add rollout manifests for tests
 for rollouts for ease of use
- update kubernetes_asyncio dependency to use patched fork
- add filelock dependency for preventing xdist worker collision
- Fix rollout models to allow population by field name
- Fix opsani_dev resource requirement check to work with rollouts
- Refactor rollout setup fixtures to eliminate redundant code
- Add locking to rollout CRD application to prevent xdist collisions
- Fix types used for rollout sidecar injection test
- Fix opsani_dev test_generate unit test
- Add resource requirements tests for rollouts
- Add rollout manifests for resource requirements tests
- Specify container port protocol explicitlty in rollout manifests as
 they are not autopopulated by the rollout CR (however,
 pods do autopopulate values if not contained in the rollout)
- add missing label selector in test manifest
 opsani_dev/argo_rollouts/rollout.yaml
- cleanup whitespace
#test:integration
- opsani_dev generate_kubernetes_config update to wrap
 deployment/rollout configs in a list
- update opsani_dev_test remedy step 7 for refactored canary code
#test:integration
- Rollout model make revision_history_limit optional
- remove kube dependency from test_generate_rollout_config unit test
- Introduce get_tuning_container, get_tuning_pod_template_spec,
 and update_tuning_pod methods to deployment and rollout to reduce
 spaghetti code
- Add FakeKubeResponse class and default_kubernetes_json_serializer to
 allow use of public api_client.deserialize method
- Add rollout integration test coverage to check service adopts tuning
- Update checks logic to not run required checks when exclusive is true
- Update rollout mdoel optional fields
- Add rollout match_labels property
- update return type of rollout pod_template_spec property
- refactor OpsaniDevChecks into controller agnostic abstract base class
 BaseOpsaniDevChecks
- Define OpsaniDevChecks class for checks against a Deployment controller
- Define OpsaniDevRolloutChecks for checks against a Rollout controller
- Add new Rollout check for opsani_role selector and label
- Update opsani_dev_test for refactored checks
- Add opsani_dev_test coverage for opsani_role check
- Expand opsani_dev_test change_to_resource context manager to support
 rollout
@linear
Copy link

linear bot commented Jul 29, 2021

ENG-101 Argo rollout support

Per @fred from Slack

Rollouts are like standard k8s deployments with extra features that we don't really interact with. The main concern is extracting the pod template spec for creation of the tuning pod at which point it becomes just a standard pod. Take a look at the pydantic models I define here (will have to click "Files Changed" -> "load diff" before it will highlight the right line): https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2080Here is where the models get converted into the standard k8s tuning pod: https://github.com/opsani/servox/compare/feature/argo-rollouts-rebase#diff-2a0a64b2e5a9a27d1e5c45b7e657cd286a87a13cfe73cfe381aab7a15a77dd94R2536Quite a bit has changed with the servox code but the gist is to take the models defined there and build out the needed Rollout class so that servox can use it in the same way as the standard deployment. Why don't you take some time to review the models and code diff and then we can hop on a call to answer any questions you might have. Sound good?

@peter provided additional clarification:

I think the minimal amount of work that we need is to support tuning instance for argo rollouts. This means we just need to be able to read the pod spec from argo rollouts instead of deployment. At least in the urgent track, we don't need to be able to actually adjust the rollout itself (this is promotion or adjustment in saturation mode). This literally should be:

  • Allow specifying 'rollout' attribute and name as a target (instead of 'deployment'); support that both in the opsani_dev and the kubernetes connectors
  • In the kubernetes connector, handle rollouts as a target of getting info (pod spec), which just happens to be in the same place where it is in a Deployment object (top-level attribute, same exact structure)
  • There may be other places in the opsani_dev connector that need to work with rollout vs deployment (during checks/remedies)

There is some urgency to have the above (ie support Argo Rollouts as a target but without the ability to adjust it (just read from it and have tuning instance created - which tuning instance is a standalone pod, just like when we do deployments). Similarly, there may be some tweaks in the metric labels (hopefully not).
Hence I propose to split the implementation into two parts: (a) support just tuning instance for rollouts (as described above); test and release or keep in a custom branch; and (b) full support for rollouts where we can even make adjustments. (B) is the ultimate goal but (a) can help us if it can be done earlier without disrupting the plan too much


(Lifemiles is working with Argo Rollouts)

- reorder params of injection tests so they generate differnt kubetest
 namespaces
- rename test_inject_by_source_port_name_with_symbolic_target_port test
 to prevent kubetest namespace collision
- Update rollout selector check to remove automated remedy and add
 warning for orphaned replicasets
- Rename rollout kubernetes integration tests to reduce kubetest crb
 name collisions
- Update opsani_dev rollout selector check test to remove validation of
 dropped remedy
 reduce kubetest cluster role binding name collisions
 remove unused import of tests.helpers in opsani_dev_test
@linkous8 linkous8 marked this pull request as ready for review July 30, 2021 21:40
Copy link
Contributor

@blakewatters blakewatters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline but additionally the code change to test ratio seems out of alignment. There are touches all over the place and a bunch of new classes but it is all backed with a few integration tests. I didn't go and enumerate the cases but I am happy to try and map it out with you.

@@ -37,6 +36,7 @@ toml = "^0.10.2"
colorama = "^0.4.4"
pyfiglet = "^0.8.post1"
curlify2 = "^1.0.0"
kubernetes-asyncio = {git = "https://github.com/opsani/kubernetes_asyncio", rev = "v12.1.2-custom-resource-patch-fix"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we sent this back upstream as a PR? Starting to carry around our own custom fork of the Kubernetes library is sorta heavy. Knowing if we can get a patch landed and get back on a release is significant given how foundational this library is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch I'm using in the fork is not upstreamable as it was applied to the generated code rather than updating the code generation models. The upstream library maintainer has PRed a fix for this issue into the standard kubernetes python client but its unclear when that will be landed and the fork is mainly a temporary workaround.

The library does not release very frequently so keeping up with patching the fork should not be too burdensome and I've also thrown together some automation on the fork to let us know when we need to rebase it: https://github.com/opsani/kubernetes_asyncio/blob/master/.github/workflows/check-upstream-release.yaml

See ENG-148 for discussion pertaining to this workaround

Comment on lines +2029 to +2032
class FakeKubeResponse:
"""Mocks the RESTResponse object as a workaround for kubernetes python api_client deserialization"""
def __init__(self, obj):
self.data = json.dumps(obj, default=default_kubernetes_json_serializer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff should live in the tests module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of a workaround to allow the use of the api_client.deserialize() public method instead of the private method api_client._ApiClient__deserialize(). See kubernetes-client/python#977 (comment) for more context and LMK if we should drop the Response faking and just go with the private method instead

Comment on lines 2034 to 2037
# Use alias generator so that lower camel case can be parsed to snake case properties to match k8s python client behaviour
def to_lower_camel(string: str) -> str:
split = string.split('_')
return split[0] + ''.join(word.capitalize() for word in split[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prolly should live up in the types module and needs some unit tests. The naming seems unintuitive as it assumes you have a string delimited by an underscore. The comment is perplexing as well. How can you have a lower camel case string? Isn't camel-case defined entirely by having a mix of upper and lower case characters delimiting the boundary of the components?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. WRT the method name, the input for this method isn't the field name of the kubernetes object, rather its the fields of the pydantic model. Eg. deletion_grace_period_seconds -> deletionGracePeriodSeconds
  2. Lower camel case seemed to be the industry accepted term for having the first word be lower case followed by standard camel case but I will make an update to clarify this with the more formal naming of "dromedary case"

# NOTE/TODO: fields typed with Any should maintain the same form when dumped as when they are parsed. Should the need
# arise to interact with such fields, they will need to have an explicit type defined so the alias_generator is applied
class RolloutV1LabelSelector(RolloutBaseModel): # must type out k8s models as well to allow parse_obj to work
match_expressions: Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do better than this on the type identifier, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +2091 to +2094
env_from: Any
image: str
image_pull_policy: Optional[str]
lifecycle: Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother to even have these fields if you aren't going to type them? We can just use the Config to ignore unspecified fields if you don't want to build them out but specifying fields as Any isn't helpful and effectively backslides into turning the data model into a glorified dictionary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no need to interact with these fields in the context of the connector but can't exclude them from the model because they would not be included in copying the rollout template spec into a standalone pod. By leaving the type as Any, we can effectively say we don't care about the typing of these fields because they came from an already validated Kubernetes object and will only ever be used to populate a new Kubernetes object.

I don't see a problem with the model being a glorified dictionary in this context since we are only interested in passing through the data from reads to any write operations and these models are meant to be temporary anyway until the generated models can be implemented.

LMK If you still disagree with the justification above, and I can expand the models to include fields we currently specify as Any.

servo/connectors/kubernetes.py Outdated Show resolved Hide resolved
servo/connectors/opsani_dev.py Outdated Show resolved Hide resolved
servo/connectors/opsani_dev.py Outdated Show resolved Hide resolved
Comment on lines +2018 to +2022
"port, service",
[
('fiber-http', None),
('fiber-http', 80),
('fiber-http', 'http'),
(None, 'fiber-http'),
(80, 'fiber-http'),
('http', 'fiber-http'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd you flip this? It seems more intuitive to me as service then port?

Copy link
Collaborator Author

@linkous8 linkous8 Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves some of the flakiness in integration tests due to kubetest namespace collisions. What happens is that the parameter is appended to the test name then truncated to 43 characters and prepended with kubetest and appended with a timestamp. The result namespace name is kubetest-test-inject-single-port-deployment-fiber-ht- which is the same for all 3 parameterizations which causes conflicts when each of the test goes to create the namespace. You can see an example of this failure in this test run: https://github.com/opsani/servox/runs/3204520239

By reordering the parameters, each test parameterization produces a unique namespace name which prevents issues from tests trying to use the same namespace and/or tearing down the namespace set up by another test. eg.

kubetest-test-inject-single-port-deployment-none-fib-
kubetest-test-inject-single-port-deployment-80-fiber-
kubetest-test-inject-single-port-deployment-http-fib-

tests/connectors/kubernetes_test.py Outdated Show resolved Hide resolved
- Drop "tuning" from methods with no direct tie to the concept of tuning
- Rename alias generator to reduce ambiguity by indicating dromedary
 case
- Update hpa_replicas field to be idiomatic
- Move rollout const args to private static class field
- Update rollout api_clients dict to use rollout constants
- Drop rollout match_labels prop to maintain consistency with deployment
- Refactor CanaryOptimization to remove overloading of deployment with
 rollout, add target_controller* getters for DRYness
- Add validator to CanaryOptimziation so rollout and deployment are
 mutually exclusive
- Rename CanaryOptimziation create var/arg:
 deployment_config -> deployment_or_rollout_config
 deployment -> deployment_or_rollout
- Rename KubernetesOptimizations create var
 deployment_config -> deployment_or_rollout_config
- Update KubernetesOptimizations create to use nested/cascaded timeout
 config
- Update KubernetesConfiguration to make deployments field optional
 (also updated KubernetesOptimizations create)
- Update cascade_common_settings to ignore None fields
- Update opsani_dev generate_kubernetes_config to remove empty
 deployments collection from rollout config
- Add checks helper _get_controller_service_selector to maintain
 abstraction between deployments and rollouts
- Add rollout checks validation that pod-template-hash is set on status
- Update k8s rollout integration test:
 set config deployments to None, whitespace nit
- Refactor KubernetesChecks to support argo rollouts
- Fix k8s integration tests failing from timeouts not being cascaded
- Fix opsani dev tests CanaryOptimization.create using wrong arg name
- Update config cascade logging to indicate field is set, not unset
- Update config cascade logging to log value of set field
- Specify overwrite True in test config cascade so default timeout is
 overriden
 test_adjust_deployment_never_ready
Copy link
Contributor

@rstarmer rstarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linkous8 linkous8 merged commit efd4eb4 into main Aug 13, 2021
@linkous8 linkous8 deleted the fred/eng-101-argo-rollout-support branch August 13, 2021 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants