Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test plan from design doc to KEP for CSI storage migration #1499

Merged
merged 1 commit into from
Jan 28, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions keps/sig-storage/20190129-csi-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ see-also:
- [Graduation Criteria](#graduation-criteria)
- [Alpha -> Beta](#alpha---beta)
- [Beta -> GA](#beta---ga)
- [Test Plan](#test-plan)
- [Per-driver migration testing](#per-driver-migration-testing)
- [Upgrade/Downgrade/Skew Testing](#upgradedowngradeskew-testing)
- [Implementation History](#implementation-history)
<!-- /toc -->

Expand Down Expand Up @@ -123,6 +126,58 @@ The detailed design was originally implemented as a [design proposal](https://gi

* All volume operation paths covered by Migration Shim in Beta for >= 1 quarter without significant issues

## Test Plan

### Per-driver migration testing

We will require *each* plugin/driver provider to set up public CI to run all
existing in-tree plugin driver tests for their migrated driver. The CI should
include all tests for the in-tree driver with a focus on tests labeled `In-tree
Volumes [driver: {inTreePluginName}]` with a cluster that has CSI migration
enabled with feature flags. The driver authors will be expected to prove (using
the tests) that the driver can handle anything the in-tree plugin can including,
but not limited to: dynamic provisioning, pre-provisioned volumes, inline
volumes, resizing, multivolumes, subpath, volume reconstruction. The onus is on
the storage provider to use appropriate infrastructure to run these tests.

If migration is on for that plugin, the test framework will inspect
kube-controller-manager and kubelet metrics to make sure that the CSI driver is
servicing the operations. This enables the test suite to programatically confirm
migration status. The framework must also observe through metrics that none of
the in-tree code is being called.

The above is done by checking that no in-tree plugin code is emitting metrics
when migration is on. We will also confirm that metrics are being emitted in
general by confirming the existence of an indicator metric.

Passing these tests in Public CI is the main graduation criterea for the
`CSIMigration{provider}` flag to Beta.

### Upgrade/Downgrade/Skew Testing
davidz627 marked this conversation as resolved.
Show resolved Hide resolved

The Kubernetes community will have test clusters brought up that have different
feature flags enabled on different components (ADC and Kubelet). Once these
feature flag skew configurations are brought up the test itself would have to
know what configuration it’s running in and validate the expected result.

Configurations to test:

| ADC | Kubelet | Expected Result |
|-------------------|----------------------------------------------------|--------------------------------------------------------------------------|
| ADC Migration On | Kubelet Migration On | Fully migrated - result should be same as “Migration Shim Testing” above |
| ADC Migration On | Kubelet Migration Off (or Kubelet version too low) | No calls made to driver. All operations serviced by in-tree plugin |
| ADC Migration Off | Kubelet Migration On | Not supported config - Undefined behavior |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this possible in a downgrade scenario? or do nodes downgrade before control plane?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes must be downgraded before the controller

| ADC Migration Off | Kubelet Migration Off | No calls made to driver. All operations service by in-tree plugin |

Additionally, the community will craft a test where a cluster should be able to
run through all plugin tests, do a complete upgrade to a version with CSI
Migration turned on, then run through all the plugin tests again and verify that
there is no issue.

Running this set of tests is optional for a per-provider basis. We would
recommend it for providing extra confidence but the framework for
upgrade/downgrade is provider agnostic.

## Implementation History

Major milestones in the life cycle of a KEP should be tracked in `Implementation History`.
Expand Down