Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpod: make use of new pasta option from c/common #23791

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Aug 28, 2024

pasta added a new --map-guest-addr to option that maps a to the actual
host ip. This is exactly what we need for host.containers.internal
entry. So we now make use of this option by default but still have to
keep the exclude fallback because the option is very new and some
users/distros will not have it yet.

This also fixes an issue where the --dns-forward ip were not used when
using the bridge network mode, only useful when not using aardvark-dns
as this used the proper ips there already from the rootless netns
resolv.conf file.

Fixes #19213

Podman now uses the pasta --map-guest-addr option by default which is used for the host.containers.internal entry to allow the container to reach the host by default. 

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 28, 2024
Copy link
Contributor

openshift-ci bot commented Aug 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2024
@Luap99 Luap99 added the bloat_approved Approve a PR in which binary file size grows by over 50k label Aug 28, 2024
@Luap99 Luap99 force-pushed the pasta-map-guest-addr branch 3 times, most recently from e99eb0d to c1e3950 Compare August 29, 2024 11:27
Copy link
Collaborator

@dgibson dgibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, at least with my limited familiarity with the podman codebase. A new nits noted in comments.

@@ -2306,8 +2308,13 @@ func (c *Container) addHosts() error {
}

var exclude []net.IP
var hostContainersInernalIP string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant Internal instead of Inernal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


// mapGuestAddrIpv4 static ip used as forwarder address inside the netns to reach the host,
// given this is a "link local" ip it should be very unlikely that it causes conflicts
mapGuestAddrIpv4 = "169.254.0.2"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a pre-existing issue with the --dns-forward address, but RFC 3927 says the first and last 256 addresses of 169.25.0.0/16 are "reserved for future use". So it might be wiser to use 169.254.1.XXX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: vendor code is just a copy of the actual upstream code, so it is best to add your comments on containers/common#2136.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that there are "reserved":

The first 256 and last 256 addresses in the 169.254/16
prefix are reserved for future use and MUST NOT be selected by a host
using this dynamic configuration mechanism.

I am mean sure it is reserved so it might get a special meaning which can break us but using another ip from the proper range is also risking conflicts with real world users
As an example currently all our GCE CI VM's have nameserver 169.254.169.254 set so it means this ip cannot be reached directly from the container anymore which currently works fine. We do not really know how link local addresses are used in the wild and we should have a default that is unlikely to cause conflicts.

Of course we also might get in trouble if this range will be assigned for some new use. If you have other suggestions of unused ip ranges I am happy to hear them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: vendor code is just a copy of the actual upstream code,

I'm aware, but I didn't know how to find the relevant upstream PR..

so it is best to add your comments on containers/common#2136.

.. until now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that there are "reserved":

The first 256 and last 256 addresses in the 169.254/16
prefix are reserved for future use and MUST NOT be selected by a host
using this dynamic configuration mechanism.

I am mean sure it is reserved so it might get a special meaning which can break us but using another ip from the proper range is also risking conflicts with real world users As an example currently all our GCE CI VM's have nameserver 169.254.169.254 set so it means this ip cannot be reached directly from the container anymore which currently works fine. We do not really know how link local addresses are used in the wild and we should have a default that is unlikely to cause conflicts.

I think it should be ok to use something in the non-reserved range. AIUI, if guests want to allocate a link-local themselves they should do duplicate address detection via ARP. Since we know we're the first thing on the link, we should be ok. Actually, right now pasta responds to all ARPs, so I'm pretty sure a guest can't allocate a link local addr (everything will appear to be a duplicate). Which is something I want to fix, but largely independent of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It it not that a guest allocates this, if some service on the network is using a link local addresses and we use the same address for dns or map guest option this ip will no longer be reachable from within the container because pasta maps this to a different ip on the host.

To be clear here the GCE instance has 10.128.15.199/32 ip assigned to its interface with

default via 10.128.0.1 dev eth0 proto dhcp src 10.128.15.199 metric 100 
10.128.0.1 dev eth0 proto dhcp scope link src 10.128.15.199 metric 100 

Yet it uses 169.254.169.254 as nameserver which routes fine in this network. I do not know if other networks do something similar where the link local range is used. So there is very well the chance to break connectivity to that host if we use the same link local ip for the option as that host.


And yes I agree this very much "might break" in super weird networks and very likely not a problem for basically everyone. So if others think using 169.254.1.x is better to not use reserved ranges I am fine to change it
@sbrivio-rh @mheon WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing all this in a bit, and then I'll come back to this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let me answer here first, as @dgibson already reviewed (I might try to review as well but don't count on it).

if some service on the network is using a link local addresses and we use the same address for dns or map guest option this ip will no longer be reachable from within the container because pasta maps this to a different ip on the host

I don't see that as a real concern because those addresses are link-local, and anything a container might reasonably expect to access on the same network segment (that's the scope of link-local addresses) is pretty much just DNS nowadays, which we already handle in a special way.

Strictly speaking, I guess Podman should perform duplicate address detection (for IPv4, it's done via ARP, RFC 3927 2.5). It's an optimistic form of detection so it doesn't cause additional delays in a general case, but it's not exceedingly simple either, and the risk looks so low to me that I'm not sure it makes sense to implement this.

So, yeah, I would use 168.254.1.x/24.

The risk of using reserved ranges is not just that they might be assigned to something one day, but you might fail sanity checks in applications, or in a future version of the kernel, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of all comes back to what we consider "the link". Is it just the virtual link from the container to pasta, or does it include things that are local on one of the host's links. At the moment we're not really consistently either, which is something I have slowly progressing plans to sanitize.

Allowing access to things on the host link is arguably "more transparent", and does let you reach those sites. However it makes it much less clear what the semantics should be in the case of multiple host links, or a disconnected host, or a host moving from one link to another. Plus, I'm not sure it's even possible to adequately maintain the illusion of being on the host link in enough cases.

So, I'm generally trying to push us in the direction of considering the pasta<->container connection "the link", at least in default configurations. This has the extremely convenient additional effect that it means that link-local addresses all become free to allocate for internal purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -10,6 +10,9 @@ type SetupResult struct {
// DNSForwardIP is the ip used in --dns-forward, it should be added as first
// entry to resolv.conf in the container.
DNSForwardIPs []string
// MapGuestIps are the ips used for the --map-guest-addr option which
// we can use for the host.containers.internal entry.
MapGuestAddrIPs []string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this can have at most one IPv4 and one IPv6 address (for now, anyway).

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2024
Luap99 added a commit to Luap99/common that referenced this pull request Sep 2, 2024
Per feedback[1] the 169.254.0.0/24 range is reserved for future use in
RFC 3927. As such we should not use it here as it might break in the
future if the range gets assigned a new meaning. Switch to 169.254.1.1.

[1] containers/podman#23791 (comment)

Signed-off-by: Paul Holzinger <[email protected]>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 2, 2024
Includes my pasta changes.

Signed-off-by: Paul Holzinger <[email protected]>
pasta added a new --map-guest-addr to option that maps a to the actual
host ip. This is exactly what we need for host.containers.internal
entry. So we now make use of this option by default but still have to
keep the exclude fallback because the option is very new and some
users/distros will not have it yet.

This also fixes an issue where the --dns-forward ip were not used when
using the bridge network mode, only useful when not using aardvark-dns
as this used the proper ips there already from the rootless netns
resolv.conf file.

Fixes containers#19213

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 removed the bloat_approved Approve a PR in which binary file size grows by over 50k label Sep 6, 2024
@Luap99 Luap99 changed the title WIP: libpod: make use of new pasta option from c/common libpod: make use of new pasta option from c/common Sep 6, 2024
@Luap99 Luap99 marked this pull request as ready for review September 6, 2024 12:44
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2024
@mheon
Copy link
Member

mheon commented Sep 6, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 6, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit f22f4cf into containers:main Sep 6, 2024
87 of 88 checks passed
@Luap99 Luap99 deleted the pasta-map-guest-addr branch September 6, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pasta: figure out how to deal with /etc/{hosts,resolv.conf} entries
5 participants