Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support Podman on HashiCorp Nomad #3387

Closed
yishan-lin opened this issue Jun 20, 2019 · 19 comments
Closed

[Feature Request] Support Podman on HashiCorp Nomad #3387

yishan-lin opened this issue Jun 20, 2019 · 19 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@yishan-lin
Copy link

/kind feature

Description
HashiCorp Nomad is an orchestrator that supports a variety of container runtimes via task driver plugins. Nomad currently supports Docker, rkt, QEMU, Java task drivers.

Nomad 0.9 introduced a plugin framework that enables users to write task drivers to support any container runtime (i.e Singularity, LXC). There has been significant interest in having a Podman task driver plugin for Nomad, especially given the prevalence of RHEL users.

Podman Feature Request filed on Nomad:

Overview + Task Driver Plugin Framework:

Examples of community-built Nomad task driver plugins:

@openshift-ci-robot openshift-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 20, 2019
@baude
Copy link
Member

baude commented Jun 20, 2019

@yishan-lin what is the expectation here on podman upstream?

@yishan-lin
Copy link
Author

It would be hard to say without knowing the velocity of Podman upstream. Are breaking changes introduced often? How often are new features released in base Podman that would need to be brought into its plugin?

Conversely, in terms of the effort to maintain this plugin and keep it up to date with Nomad's upstream driver APIs, we see it as pretty minimal - we don't have any features in the immediate feature that would result in changes in Nomad's upstream driver API.

@baude
Copy link
Member

baude commented Jun 24, 2019

We try to not to introduce breaking changes but then again, I'm not sure where exactly you would be referring to. I dont know enough about the plugins to say otherwise.

@towe75
Copy link
Contributor

towe75 commented Jun 27, 2019

Hi. I am playing with the nomad plugin api right now.
Though i am a bit unsure on the best approach in regard to the architecture.

My choices so far are:

nomad-plugin-podman links directly against libpod go api.
Advantages: everything is nicely encapsulated, no magic, full control and all features even if they are not exposed over varlink
Disadvantages: getting go dependencies right is relatively hard because of some common libraries in nomad and libpod ecosystems, i.e. nomad is pinned to a old version of ugorji/go, see hashicorp/nomad#5676
Also we would depend directly on internal libpod api changes.

nomad-plugin-podman uses varlink and starts podman as sub-process.
Advantages: building should be straight forward, also podman varlink api is sufficient. No systemd integration needed.
Disadvantages: process management, podman can crash and needs to be restarted, etc.

nomad-plugin-podman uses varlink on socket activated podman.
Advantages: process management is simple, setup straight forward, can be better from security perspective as well (no need to run nomad agent as root).
Disadvantage: more impact on the system setup.

So whats your opinions, how should the integration look like?

@rhatdan
Copy link
Member

rhatdan commented Jun 27, 2019

Podman varlink bridge mode supports running podman varlink if it is not configured. IE no socket activation needed. Basically the podman valink will be launched based on the CLI, and then will run for the length of the connection. This can be run in root or rootless mode.

@towe75
Copy link
Contributor

towe75 commented Jun 27, 2019

@rhatdan , yes, i understood this already. That's what i ment with "nomad-plugin-podman uses varlink and starts podman as sub-process."

This approach would lead us to this process hierarchy:

 nomad
   └── nomad-plugin-podman
               └── podman

So the plugin would control the lifecycle of a single podman (with varlink bridge mode) "slave".
Nomads plugin api, in turn, also starts the plugin as sub process.

To re-ask: this would be your favorized architecture?

@rhatdan
Copy link
Member

rhatdan commented Jun 27, 2019

I believe this is what we are doing with next generation of cockpit-podman

@haraldh @baude @jwhonce WDYT?

@mheon
Copy link
Member

mheon commented Jun 27, 2019 via email

@towe75
Copy link
Contributor

towe75 commented Jul 7, 2019

I published a systemd/varlink based proof of concept to https://github.com/pascomnet/nomad-driver-podman. There is of course no release yet but you can download the binary from the linked circleci build or just compile it yourself. Featureset is very limited but it's some start, also it lacks tests so far.

@jwhonce
Copy link
Member

jwhonce commented Jul 9, 2019

I think using varlink would be the best.

@towe75
Copy link
Contributor

towe75 commented Jul 9, 2019

Thank you for your opinions. Varlink seems to be a good fit so far. But i am sorry to say: it almost feels like having a daemon :-)

Sometimes i face some strange deadlock situations when accessing a container immediately after creating and starting it (all done in the same varlink session).
I will try to get a reproducable test to file a bug. I am pretty sure it happens when GetContainer is used but less often while inspecting a container and almost never when using a simple PS. The deadlock is only solvable by killing/restarting the systemd-podman also another interactively used podman locks up in this situation.

@mheon
Copy link
Member

mheon commented Jul 9, 2019

Would be very interested to look at that if you can get us a reproducer - deadlocks are high priority to fix

@rhatdan
Copy link
Member

rhatdan commented Aug 5, 2019

@yishan-lin @mheon @towe75 What is the latest on this issue?

@mheon
Copy link
Member

mheon commented Aug 5, 2019

We've tracked the mentioned deadlock into c/storage. I believe @baude is still debugging.

@towe75
Copy link
Contributor

towe75 commented Aug 6, 2019

Well, coming back to the actual topic of this issue: as stated, i built a varlink based prototype as POC.

Recently i spend a few hours and did the same thing without varlink, linking libpodman directly (using go 1.12, go.mod).
Although it works, development experience was rather bad. I had to dig a lot in libpod's source code to learn how things fit together. Also lack of "clickable" godoc.org reference felt strange. I understand that using podman as a library is not yet your first priority, so no offense here. Possibly a new facade layer with a simpler to use interface can improve the situation in a later version.

Overal, your varlink interface seems ATM to be the better fit in terms of effort and maintenance.
I might invest a bit more time and try to spawn a varlink podman directly from the plugin instead of poking the systemd managed socket, like mentioned above.

@github-actions
Copy link

github-actions bot commented Nov 3, 2019

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 3, 2019

@mheon @baude What should we do with this one?

@towe75
Copy link
Contributor

towe75 commented Nov 3, 2019

@rhatdan for sure it is not your primary goal to become fully nomad compatible. People will find this issue/thread even if it's closed and perhapts they stumble upon my POC. Also i plan to improve this plugin in my spare time, although i did not get a lot of feedback yet. A interesting experiment will be to map nomad groups to podman pods, in example.
To sumarize: i would close this issue.

@afbjorklund
Copy link
Contributor

Overal, your varlink interface seems ATM to be the better fit in terms of effort and maintenance.

This is rather ironic, and it was the same conclusion that I came to with podman-machine as well...

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

8 participants