This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
A GitHub Codespaces development flow is described below, which you can use to test Karpenter functionality on your own cluster, and to aid rapid development of this project.
-
Install VSCode: Go here to download VSCode for your platform. After installation, in your VSCode app install the "GitHub Codespaces" Extension. See here for more information about this extension.
-
Create Codespace (~2min): In browser, click Code / "Create a codespace on main" (for better experience customize to use 4cores/8GB), wait for codespace to be created. It is created with everything needed for development (Go, Azure CLI, kubectl, skaffold, useful plugins, etc.) Now you can open up the Codespace in VSCode: Click on Codespaces in the lower left corner in the browser status bar region, choose "Open in VSCode Desktop". (Pretty much everything except for
az login
and someaz role assignment
works in browser; but VSCode is a better experience anyway.) -
Provision cluster, build and deploy Karpenter (~5min): Set
AZURE_SUBSCRIPTION_ID
to your subscription (and customize region inMakefile-az.mk
if desired). Then at the VSCode command line runmake az-all
. This logs into Azure (follow the prompts), provisions AKS and ACR (using resource group$CODESPACE_NAME
, so everything is unique / scoped to codespace), builds and deploys Karpenter, deploys sampledefault
Provisioner andinflate
Deployment workload. -
Manually scale the
inflate
Deployment workload, watch Karpenter controller log and Nodes in the cluster. Example of manually scaling up to 3 pods:
kubectl scale deployments/inflate --replicas=3
To debug Karpenter in-cluster, use make az-debug
, wait for it to deploy, and attach from VSCode using Start Debugging (F5). After that you should be able to set breakpoints, examine variables, single step, etc. (Behind the scenes, besides building and deploying Karpenter, skaffold debug
automatically and transparently applies the necessary flags during build, instruments the deployment with Delve, adjusts health probe timeouts - to allow for delays introduced by breakpoints, sets up port-forwarding, etc.; more on how this works is here.
Once done, you can delete all infra with make az-rmrg
(it deletes the resource group), and can delete the codespace (though it will be automatically suspended when not used, and deleted after 30 days.)
- If you see platform architecture error during
skaffold debug
, adjust (or comment out)--platform
argument. - If you are not able to set/hit breakpoints, it could be an issue with source paths mapping; see comments in debug launch configuration (
launch.json
)
Q: I was able to trigger Karpenter to execute scaling up nodes as expected, using my own customized deployment of pods. However, scaling down was not handled automatically when I removed the deployment. The two new nodes created by Karpenter were left around. What is going on?
A: Additional system workloads (such as metrics server) can get scheduled on the new nodes, preventing Karpenter from removing them. Note that you can always use kubectl delete node <node>
, which will have Karpenter drain the node and terminate the instance from cloud provider.
Q: When running some of the tests locally, the environment failed to start. How can I resolve this?
A: Oftentimes, especially for pre-existing tests, running make toolchain
will fix this. This target will ensure that you have the correct versions of binaries installed.
We have three types of testing:
- Unit Tests
- Acceptance Tests
- End-to-end Tests
When to use: Use for fine grained testing of functions, classes, etc.
File Location(s): Under pkg/*
next to the related components.
Testing framework: Go standard tests
When to use: Acceptance tests are coarse grained tests that integrate with the upstream karpenter library and only fake the API calls to Azure clients. These are behavior-driven and should start from pending pod pressure whenever possible.
File Locations: Under pkg/
next to the related components.
Testing framework: Ginkgo
When to use: E2E tests aim to be as close to prod as possible. An actual cluster is spun up, scale ups + downs occur, and actual Azure clients are invoked rather than utilizing fakes/mocks.
File Locations: Under test/
with specific suites under test/pkg/suites
Testing framework: Ginkgo
config.go
is in the auth package and provides configurations needed to authenticate with Azure clients.settings.go
is in the apis package and provides settings needed for Karpenter to access a particular cluster.
- klog is only invoked when creating clients or authorizers.
- zapr is only invoked in our debug package.
- knative.dev/pkg/logging should be used everywhere else.
skaffold.yaml is the configuration file for deploying Karpenter locally via skaffold.
Why are its modifications showing up locally even though I didn't change it?
- To deploy/test locally, we run
make az-all
. This make command is composed of many different steps needed to deploy. One step patches your local skaffold.yaml file by updating certain variables based on env vars defined in theMakefile-az.mk
.
- pkg/test/environment.go is used for our acceptance tests.
- test/pkg/environment/common/environment.go is used for our end-to-end tests.