Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I ensure that containerd cannot create a container if the NRI plugin is unavailable? #85

Open
t33m opened this issue May 22, 2024 · 5 comments

Comments

@t33m
Copy link

t33m commented May 22, 2024

Is there a way to configure containerd so that starting containers is impossible if the NRI plugin is not working or hasn't registered yet?

@kad
Copy link

kad commented May 22, 2024

It is impossible now, and not sure that this is going to be good idea. Some of NRI plugins are going to be deployed as containers, and infrastructure components are also in containers (e.g. kubelet's static pod manifests), so not starting until NRI plugin registers will lead to non-functional node. Another thing is the crashes of the plugins and re-connects: we shouldn't have scenarios where crash of the plugin will render node even temporary to non-usable state.

For all of those the sync-on-connect calls were implemented: the plugin during start can inspect and adjust running containers which were started before NRI plugin registers. As well, if there is something that can't be modified via Update cals, there is always possibility for NRI plugin to trigger stop of the existing container, and eventual re-creation of it by orchestration layer above.

@mikebrow
Copy link
Member

Some discussion here .. wip #43 (comment)

@mikebrow
Copy link
Member

@kad good insight

Should probably develop some use cases .. an integration bucket and a e2e bucket that exercises the supported use cases.

@klihub
Copy link
Member

klihub commented May 24, 2024

It is impossible now, and not sure that this is going to be good idea. Some of NRI plugins are going to be deployed as containers, and infrastructure components are also in containers (e.g. kubelet's static pod manifests), so not starting until NRI plugin registers will lead to non-functional node. Another thing is the crashes of the plugins and re-connects: we shouldn't have scenarios where crash of the plugin will render node even temporary to non-usable state.

For all of those the sync-on-connect calls were implemented: the plugin during start can inspect and adjust running containers which were started before NRI plugin registers. As well, if there is something that can't be modified via Update cals, there is always possibility for NRI plugin to trigger stop of the existing container, and eventual re-creation of it by orchestration layer above.

I agree with @kad that it's probably not a good idea to bake such logic into the runtime itself.

If I had to do something to this effect, my first idea would probably be to roll some extra tooling for it. Run the critical plugin(s) as DaemonSets, monitor whether they are ready/live (or have them refresh a CRD or a label on their node periodically and monitor that), then taint and if necessary drain/cordon/uncordon the nodes as necessary by extra tooling. If useful, maybe also label workloads that need the critical plugins as such, and have a mutating webhook to inject tolerations for unlabeled workloads to tolerate nodes without the critical plugins being up and running.

@zhaodiaoer
Copy link
Contributor

In the scenarios where I work, i also encountered similar problems... I'm not sure if it's a good idea for us to provide the ability to categorize all plugins by role and make corresponding distinctions during the plugin registration/deregistration process. such as give a flag used to identify what the critical level for plugin to notify NRI when Initiate registration, and when deal with different level plugin's deregistration use different policy, like only when plugin explicitly initiates one rpc request can the deregistration be completed. (just an example and hasn't been carefully considered)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants