-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubespray 3.0 discussion #6400
Comments
I'm all in for that, a PR was raised a long time ago to set containerd as the default runtime (but was drop as too much work and too much breaking change), but that would allow us to get rid of a lot of docker default commands and at the same time move toward something more CRI oriented. |
RELEASE.md says:
AFAIK we already did non-backwards compatible change in the v2.x of Kubespray (when moving to kubeadm for instance). The "production ready" party is a lot about providing a path for people to move from v2.X and v2.(X+1). What I'm saying is that we can do breaking changes (like changing default container engine) as long as they are accepted by the community and well documented. @EppO I thought non-kubeadm was removed in #3811 is there some other things that need clean-up? kubeadm is the only supported deployment method since v2.9. For the GitLab CI For conformance tests, there is @MarkusTeufelberger has some very valuable input on role design and molecule, raised a couple of issues around it. Examples: #4622 #3961 |
Good to know. I was more worried about end-users that may not know this and end up breaking some production clusters while trying to upgrade, hence a 3.0 proposal that is more explicit on that kind of breaking changes.
I missed it because I didn't change my inventory for a while and some deprecated options are still there. I think it would be beneficial for end-users as well to list deprecated inventory options for each releases. I guess I'm not the only one with some old settings :)
I hear you. We can't use pipelines for merge requests because we don't create the merge request in GitLab, so that's a dead end. But I'm convinced we should architect the CI around better changes detection to get a quicker feedback loop, if prow is an option, we should look at it.
I guess we have some work to do in that area then :)
Ideally we should run conformance tests regularly to test various setup combinations and not wait release time to pass the full conformance tests. That's why I was suggesting to separate them from the install/upgrade use cases. |
What about etcd? Should we change that default to true? It makes etcd upgrades impossible outside of kubernetes upgrades, kubeadm still doesn't support upgrading etcd without the kubernetes components AFAIK. |
|
I also though about that, scale and remove needs some love from CI |
Flip default of var kubeadm_control_plane to true and remove "experimental" from code? |
etcd_kubeadm_enabled: true |
That's actually what I was referring to with "Drop non-kubeadm deployment" but I mixed two different use cases: since 2.9 kubespray is always using kubeadm to provision the cluster but it doesn't use kubeadm join on the non-first control plane nodes by default (just another run of kubeadm init). |
Personally I'd like to drop a few features that are relatively exotic or easy to work around/implement yourself such as downloading binaries and rsync'ing them around instead of just fetching them on each node. This could really simplify the Another bigger architectural change could be to change kubespray into a collection (maybe even adding some roles to https://github.com/ansible-collections/community.kubernetes eventually and/or using them here?) and in general switching to Ansible 2.10. |
I'd prefer to rely on the distro package manager when applicable instead of downloading all the stuff but if you have a better design for the download role, feel free to submit a PR.
Ansible 2.10 is not released yet and we need to be careful on what ansible version is available on each supported distros. |
Reducing scope and configurability of Kubespray would be nice.
|
The more I think about it, the more I'm convinced kubespray should only provision kubernetes clusters on top of kubeadm, so we should only support the following 2 use cases on the etcd front:
That means removing the |
The same we could formulate in some kind of design statement how Kubespray embrace, use and extend kubeadm. Not workaround it |
Helm 3.x was released since Kubespray 2.x. It no longer requires a tiller pod and is integrated into k8s rbac. I think it would be better for Kubespray to refocus on its core competency: deploying production Kubernetes. Can include the most widely used plugins (CNI\CSI) in this. But apps that have a decent helm chart should now be deployed using that. Helm vs Ansible for deploying apps to Kubernetes is a no-brainer. Thanks to its state, Helm is truly declarative, Ansible is not. For example, uninstall a helm release and your app is removed from k8s, undefine an addon in Kubespray (eg cert_manager_enabled=false), and it remains. Most helm charts are better maintained than the addons in this project. I get the desire for Kubespray to be a one-stop-shop, so could either replace the addons with simple readme guidance explaining how to install the former addons using helm, or if workable could install the helm client and version-pinned helm charts using Kubespray. Would significantly simplify this project and the maintenance burden. |
I think we are very close to be able to use kubeadm managed etcd as the default. |
Maybe we could deal with Helm apps in a separate github project ? This project would only focus on :
Some attached CI would not require kubsepray deployment : only inventory plus any kubernetes should be enough. This would avoid people to rewrite their own helm addons playbooks and roles. EDIT: first mentioned dashboard as helm chart, bad example this is plain yaml, I removed it. Btw we may think about setting the dashboard out of kubespray scope in favor to Helm :) |
it's been released for a few months and it'd be wonderful if we could have kubespray as an ansible collection |
I'm wondering if everyone has the same view on what is a "production Kubernetes". The way I see it, people expect to get the following (ordered by "minimum" to "maximum" expectations):
I am split on where to put the demarcation of "production Kubernetes". On one hand, it would be nice to have kubespray the one-stop-shop for all above. Maintainability can be sustained either by regularly rendering Helm Charts inside an Ansible On the other hand, I do agree that kubespray needs to stay focused, reduce CI/CD pipeline response time and maintenance burden. Is there a shared view within the kubespray community on what a "production Kubernetes" is? Hopefully a "helper" question: What is a Kubernetes addon and should be managed by the Kubernetes control plane vs. what is an app on top of Kubernetes? |
I would also add monitoring (Prometheus) and tracing (Jæger) to number 6 in your list by the way as well as some log viewing/analyzing stack (Loki or Kibana). Probably also some CD mechanism like flux (https://toolkit.fluxcd.io/) to not mess with deploying Kubernetes state via Ansible. Another feature of "production" is likely updating/upgrading/adding/removing each of these components in a way that keeps the actual workload of the cluster as unaffected as possible. A lot of these things are a mix of programs that run in the cluster itself and stuff that wants to be installed on the host (often even without providing proper packages or repositories upstream, only statically compiled golang binaries). This might also need some design/solution on Kubespray side. |
You could make the case to increase scope infinitely. Perhaps in the v3 timeframe, the goal should be support the status quo using helm3. Convert the existing addons to helm releases. In terms of code, yes install helm on the controller machine like @cristiklein suggests and then for each addon, either store a values.yaml, or create one at runtime from a template, and run "helm upgrade --install" in an ansible task to deploy it. We can't currently remove an addon @MarkusTeufelberger, so again I'd suggest this should be out-of-scope for v3 (although can document how to uninstall helm charts). That in itself would be a massive reduction of complexity. The extra apps would become declarative: our content for each one would be little more than a values file. The helm chart maintainers would do the heavy lifting. I think the scope discussion can be had at a later stage. In any case, it's irrelevent for companies like us who will continue to use helm, not Kubespray for anything that has a helm chart. For us, Kubespray is one stage in a pipeline, so we'd prefer if all the "extras", including installing the helm binary on the controller machine, were kept optional. |
Anyone that has a bit of time, kubespray helm integration can start today ;) |
@EppO my preferred way to consume Kubespray is to use its Docker image. It makes it easy to reproduce and manage dependencies. I simply mount the inventory with @cristiklein the thing is different orgs or teams will have different requirements for each component (CNI, CRI, ingress, logs, ...). To take ingress as an example, you could use nginx or traefik or ambassador or all of them at once. I think the approach taken so far, has been to give a starting point for each component and allow users to takeover when their needs go beyond the defaults. Finally "production" means different things for different orgs. My opinion is that Kubespray should focus on the core Kubernetes components, things that are used by most people, and drop the settings that drive complexity but are used only by a small portion of the community. If that those less popular settings are critical to some people, then they probably should get involved (either themselves or sponsor somebody to represent their interests). @jseguillon yeah, there is definitely an opportunity for a "install all the addons in k8s" type project separate from Kubespray. Regardless of how you install Kubernetes (Kubespray, kops, AKS, EKS, GKE), you will want to install stuff afterwards (log management, monitoring, security hardening, operators, ...). Like @MarkusTeufelberger mentioned Flux is a popular option, but I've also seen simple shell script work well (think |
@MarkusTeufelberger I agree that Prometheus and Flux are nice addons. However, I feel that log and metrics viewing/analyzing (e.g., Loki, Kibana, Elasticsearch, OpenDistro, Thanos and Grafana) should be treated as applications, since they are often stateful, require careful / tedious maintenance, and need to be scaled carefully with the incoming log/metrics workload. @Miouge1 @holmesb @champtar To steer discussions, I created an initial draft of a Helm-based addons deployment for kubespray. PTAL: master...cristiklein:helm-addons The umbrella Chart could become a separate sub-project that is consumed by kubespray (via git submodule). Either way, the user is free to use only that part of kubespray. I think it achieves "batteries included but removable and feel free to choose between NiMH, Li-ion or AC adapter". Let me know what you think. |
Production ready to me (among others) means that things have been tested in CI. Therefore I have very hard to understand e.g. why multiple version k8s are supported by a specific version of kubespray. This in turn adds complexity in ansible roles. As said before, if you need an older version of k8s, use the corresponding release branch! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle stale |
What would you like to be added:
Kubeadm control plane mode
kubeadm join
is the recommended way for non-first control plane nodes and worker nodes. We should setkubeadm_control_plane
to true by default. Not sure if it makes sense to keep the legacy 'kubeadm init everywhere' use case around. Is there any edge cases with the control plane mode?Use etcdadm to manage external etcd cluster
There is a valid use case to have an "external" etcd cluster not managed by kubeadm, specially when etcd is not deployed on the control plane nodes. Currently, etcd setup is fairly manual, fragile (like during upgrades), and hard to debug. https://github.com/kubernetes-sigs/etcdadm is supposed to make etcd management easier. In the long run, kubeadm will eventually use etcdadm under the hood. It would be a good idea to implement it for the "external" etcd use case as well. Moreover, adding support for BYO etcd cluster (#6398) should be fairly easy if we go down that path.
Review CI matrix
Switch cgroup driver default to systemd
kubespray officially supports only systemd-based linux distros. We should not have two cgroup managers (see kubernetes/kubeadm#1394 (comment) for technical details).
This is a backward incompatible change, so maybe default it for new install but keep the current setting for the upgrades?
Remove docker requirements
There is still some hardcoded docker commands in the code (network plugins, etcd, node role, ...). One of kubespray's goals is to "Deploy a Production Ready Kubernetes Cluster", so it should NOT have a container engine capable of building new container image by default, for security purposes. Containerd would be a more secure default setting. In order to make that transition, we need to use
crictl
wheredocker
is used today.Why is this needed:
We need to address technical debt. Code-base is wide, some areas are old and not maintained. I'd like to take the opportunity for the next major release to lean to the maximum the code-base and make the CI more agile to get quicker feedback.
/cc @floryut, @Miouge1, @mattymo, @LuckySB
The text was updated successfully, but these errors were encountered: