port already allocated #1790

saiwl · 2017-06-01T07:53:45Z

We met "port already allocated" problem in our docker environment. It always happened after docker-daemon restarts abnormally or machine restarts abnormally.
I read the related code, and found a possible bug about this.
In the source code, the process of creating a container using port mapping is like below:

create container
create sandbox
create endpoint
allocate ports
update driver endpoint store
join sandbox
update sandbox store

And the restore process after the daemon restarts is like below:

restore port mapping based on driver endpoint restore
clean up sandbox based on sandbox store
clean up endpoint based on endpoint store

In the creating process, if the docker daemon or the machine restarts abnormally between step 5 and step 7 which truely happened in our environment, after docker daemon restarts, the port mapping would be restored in step 1 of restore process and will not be released in step 2 of restore process because sandbox was not updated, which causing ports leak.

I have made a simple fix which has been tested in our environment. I will make a PR later. Looking forward to your suggestions!

Close moby#1790 Signed-off-by: saiwl <[email protected]>

saiwl · 2017-06-09T06:32:38Z

ping @mavenugo

fcrisciani · 2017-06-12T20:48:27Z

@saiwl is this a case with live-restore enabled?

fcrisciani · 2017-06-12T22:51:49Z

@saiwl the problem that you describe makes sense to me. I actually created this PR #1805 with the objective to simplify a bit the logic there in the sandboxCleanup.

As you are correctly stating in the description it can happen that the driver and the sandbox stores go out of sync. The idea of my patch is to make the network the source of truth, so that we can reconstruct the sandbox endpoints directly from there. That simplify the logic and should ensure that we are not missing endpoints.
By chance do you have the possibility to give it a try?

saiwl · 2017-06-13T06:20:55Z

@fcrisciani Thanks.
'live-restore' is disabled.
I read your patch, actually I almost did the same thing in my PR #1794 a few days ago. The network endpoints should be deleted from store after the driver endpoints, I agree on that, which was ignored in my last PR.
I have done some testing on our machines with my PR. All seems OK except I need to fix the endpoint deleting problem by your PR.

bdeluca · 2017-06-13T09:01:05Z

I am hitting this issue also, but I can't use live-restore as I am in swarm mode.

testing both your PR's I will let you know how I go.

fcrisciani · 2017-06-13T15:55:07Z

Thanks @saiwl yes, I took the part where you fetch all the endpoints from the network, I also aggregated them by sandbox ID so to not iterate through them again later and also I removed the logic that was looping on the ones from the sandbox.
I made sure that the network is actually the only source of truth for the endpoints so that we can avoid multiple loops on different lists. The delete I guess is needed in case the driver has the endpoint saved in its store but the sandbox did not have the chance to receive it.

If @saiwl and @bdeluca can give it a try considering that you were seeing this issue would be a great validation for the patch itself and proceed further. This code path is kind of tricky so we want to avoid introducing some new issues. Thanks!

bdeluca · 2017-06-13T17:13:02Z

Hi @fcrisciani so your patch doesn't appear to fix my issue,

My issue is
Establish a lot of containers in swarm mode. Reboot. wait.
After start up Containers that start not longer have ports mapped.

I followed a trail of similar issues to @saiwl last PR and I think this might be related.

But if you say this seems like a different issue I will disappear (my issue is trivial to replicate)

fcrisciani · 2017-06-13T18:25:34Z

@bdeluca
couple of questions:

are you able to reproduce on single node or only multi node?
"a lot of containers" means? can you give an indication about your test?
is there a specific error message in the logs that can explain why the port is not exposed?

bdeluca · 2017-06-13T19:25:12Z

1. single node. 2. the more containers I have the more likely it is to happen. example docker service create --name registry0 --constraint node.role==manager --publish 5000:5000 registry:2 docker service create --name registry1 --constraint node.role==manager --publish 5001:5000 registry:2 docker service create --name registry2 --constraint node.role==manager --publish 5002:5000 registry:2 ...... docker service create --name registry10 --constraint node.role==manager --publish 5010:5000 registry:2 after is created first time you will be able to attack the ports on localhost 5000-5010. reboot the machine. ports will randomly not be available. Note: I am just using the registry image as an example. Mostly the swarm cluster isnt down but I was testing failure modes and discovered this one. 3. I see address already in use port already in use. or just nothing, everything looks like it should be open but it is not.

On 13 June 2017 at 21:25, Flavio Crisciani ***@***.***> wrote: @bdeluca couple of questions: are you able to reproduce on single node or only multi node? "a lot of containers" means? can you give an indication about your test? is there a specific error message in the logs that can explain why the

port is not exposed?

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bdeluca · 2017-06-13T19:25:18Z

single node.
the more containers I have the more likely it is to happen.

example
docker service create --name registry0 --constraint node.role==manager --publish 5000:5000 registry:2
docker service create --name registry1 --constraint node.role==manager --publish 5001:5000 registry:2
docker service create --name registry2 --constraint node.role==manager --publish 5002:5000 registry:2
......
docker service create --name registry10 --constraint node.role==manager --publish 5010:5000 registry:2

after is created first time you will be able to attack the ports on localhost 5000-5010.
reboot the machine.
ports will randomly not be available.

Note: I am just using the registry image as an example.

Mostly the swarm cluster isnt down but I was testing failure modes and discovered this one.

I see
address already in use
port already in use.
or just nothing, everything looks like it should be open but it is not.

bdeluca · 2017-06-14T07:50:01Z

Some where some thing is very confused. swarm thinks things have other ip addresses than they do.

docker service inspect zz_grafana
          "VirtualIPs": [
                {
                    "NetworkID": "xg5ilscw3zfb1ul4pgm3apv9l",
                    "Addr": "10.255.0.7/16"
                }
            ]

docker service inspect zz-registry-browser

            "VirtualIPs": [
                {
                    "NetworkID": "xg5ilscw3zfb1ul4pgm3apv9l",
                    "Addr": "10.255.0.12/16"
                }
            ]


docker network inspect ingress


"Containers": {
    "14b62c140721ba7222467b3ee0112a366ab921b3cf5382093a03c45040ec0a7b": {
        "Name": "zz-registry-browser.1.vqd41eqpe0b6y9zwajf19p2e9",
        "EndpointID": "64cc796198e0f48634a30243ff64051dc3ba638089ededc33794bac9c55624eb",
        "MacAddress": "02:42:0a:ff:00:07",
        "IPv4Address": "10.255.0.7/16",
        "IPv6Address": ""
    },

    "cc70321aba5e2b5c9a9e7e21bd88990d4a57828da8a87413379f02c50b76055a": {
        "Name": "zz_grafana.1.qydyeoebramw6al5xmqtosp78",
        "EndpointID": "8b47b3438023a80d01b793190109422ac14fb84657efc79aebbcb6322c350c4f",
        "MacAddress": "02:42:0a:ff:00:04",
        "IPv4Address": "10.255.0.4/16",
        "IPv6Address": ""
    },
            "7580111e2e3b225d33c1748cf76752265896997bb07bc44930e81ac928305638": {
                "Name": "test_ssh.1.roy8f55ht9w3207fu7eel6c0v",
                "EndpointID": "a8f2374a188bb1be47d74d0a7d4902aaabb571aa131d4cdbcc9f1d93cc2c3b2f",
                "MacAddress": "02:42:0a:ff:00:0c",
                "IPv4Address": "10.255.0.12/16",
                "IPv6Address": ""
            },

bdeluca · 2017-06-14T08:36:53Z

my simple example with the registry doesnt work because on every container port 5000 is open.

neelaruban · 2017-08-30T05:15:01Z

@fcrisciani I am also hitting this issue even with live-restore option and also i am using always restart option for all the containers . this only happens when the system is rebooted .

I have given you the logs i receive below .

2017-08-30T15:05:39.455087+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.454162035+10:00" level=warning msg="Failed to allocate and map port 80-80: Bind for 0.0.0.0:80 failed: port is already allocated" 2017-08-30T15:05:39.548964+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.547939639+10:00" level=warning msg="Failed to allocate and map port 8080-8080: Bind for 0.0.0.0:8080 failed: port is already allocated" 2017-08-30T15:05:39.551442+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.550968169+10:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /opt/app/docker/containers/09d5818abf3d42153ada33af97faf255a48d3dac10afd00127f1bad04dac1656/shm: invalid argument" 2017-08-30T15:05:39.551701+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.551068223+10:00" level=error msg="Failed to start container 09d5818abf3d42153ada33af97faf255a48d3dac10afd00127f1bad04dac1656: driver failed programming external connectivity on endpoint tyk-gateway (1a41682df361402e2839e605b474195ad6e0d30b42c1a3b573600d63c32442b0): Bind for 0.0.0.0:80 failed: port is already allocated" 2017-08-30T15:05:39.577807+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.576520164+10:00" level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /opt/app/docker/containers/8df7c5dd9381d22a5fe7b13f0080a60fb967d0bf18a5dd4b39e556e18bc8101e/shm: invalid argument" 2017-08-30T15:05:39.577954+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.576615265+10:00" level=error msg="Failed to start container 8df7c5dd9381d22a5fe7b13f0080a60fb967d0bf18a5dd4b39e556e18bc8101e: driver failed programming external connectivity on endpoint tyk_dashboard (49c544e72a5bb9dde135450610309aa8a381219350ec623d70f5121a94cfa18d): Bind for 0.0.0.0:8080 failed: port is already allocated" 2017-08-30T15:05:39.578100+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.576801111+10:00" level=info msg="Loading containers: done." 2017-08-30T15:05:39.631765+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.629799087+10:00" level=info msg="Daemon has completed initialization" 2017-08-30T15:05:39.632066+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.629865544+10:00" level=info msg="Docker daemon" commit=f5ec1e2 graphdriver=overlay version=17.03.2-ce 2017-08-30T15:05:39.642335+10:00 [localhost] dockerd: time="2017-08-30T15:05:39.641918337+10:00" level=info msg="API listen on /var/run/docker.sock"

BSWANG · 2017-12-01T07:35:31Z

@fcrisciani @saiwl I get same issue on my environment. My reproduce script as follow:

~# docker run -d --restart always --net host --name etcd elcolio/etcd
ad1d4571a9da839aec622c7f25185e5cee8b4a293ce3ee44641416de6d17ba3e
~# echo '{"cluster-store":"etcd://127.0.0.1:2379"}' > /etc/docker/daemon.json
~# service docker restart
~# docker network create -d overlay testov
6e7703dac16997d82c8012269c1f2ed080bb63aee50a160cadf3a58240accc6b
~# docker run -itd --name test_1 -p 80:80 --network testov busybox
7c45e0c4d1b0e8e7413a35d098ae964ed88cd42023abddcd757cf6978b64db68
~# docker stop etcd # endpoint will delete fail from kv-store
etcd
~# docker rm -f test_1 # remove container when kv-store unavailable
test_1
~# docker start etcd 
etcd
~# docker run -itd --name test_2 -p 80:80 --network testov busybox
42186e3bdf2dcdbba467a0fc56e8acbb8656a7e001ed13fc4854641552df52d1
docker: Error response from daemon: container 42186e3bdf2dcdbba467a0fc56e8acbb8656a7e001ed13fc4854641552df52d1: endpoint join on GW Network failed: driver failed programming external connectivity on endpoint gateway_42186e3bdf2d (b935a883b3977e1ffef17c37e2f65e8ef453c5db2d5c725acec0913922991d72): Bind for 0.0.0.0:80 failed: port is already allocated.

The #1805 can not resolve this scene.

louisburton · 2023-04-12T12:14:56Z

Any chance of merging #1805 ? Would be great to address the original issue here, even if other problem scenarios remain.

This was referenced Jun 1, 2017

Fix possible ports leak after abnormal restarts. #1791

Closed

Fix the possible ports leak. #1793

Closed

saiwl pushed a commit to saiwl/libnetwork that referenced this issue Jun 6, 2017

Fix possible ports leak after abnormal restarts.

3f22960

Close moby#1790 Signed-off-by: saiwl <[email protected]>

saiwl mentioned this issue Jun 6, 2017

Fix possible ports leak after abnormal restarts. #1794

Open

wenjianhn mentioned this issue Jul 21, 2017

Cannot start containers: port is already allocated moby/moby#20486

Open

fcrisciani assigned balrajsingh Nov 28, 2017

balrajsingh mentioned this issue Dec 6, 2017

Added required call to allocate VIPs when endpoints are restored moby/swarmkit#2468

Merged

jose-bigio mentioned this issue Dec 18, 2017

[17.12] bump vndr of swarmkit to 2e6f892 docker-archive/docker-ce#364

Closed

mk270 mentioned this issue Dec 26, 2018

container fails on startup ahdinosaur/ssb-pub#17

Closed

justincormack unassigned balrajsingh Mar 5, 2020

robertgzr mentioned this issue Sep 10, 2021

Port already in use, because proxy keeps binding to the wrong container IP balena-os/balena-engine#272

Closed

lmbarros mentioned this issue Apr 20, 2023

Update libnetwork to fix port binding issue balena-os/balena-engine#428

Merged

lmbarros mentioned this issue May 29, 2023

Fix sandbox cleanup #1805

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

port already allocated #1790

port already allocated #1790

saiwl commented Jun 1, 2017 •

edited

Loading

saiwl commented Jun 9, 2017

fcrisciani commented Jun 12, 2017

fcrisciani commented Jun 12, 2017

saiwl commented Jun 13, 2017 •

edited

Loading

bdeluca commented Jun 13, 2017

fcrisciani commented Jun 13, 2017

bdeluca commented Jun 13, 2017

fcrisciani commented Jun 13, 2017

bdeluca commented Jun 13, 2017 via email

bdeluca commented Jun 13, 2017

bdeluca commented Jun 14, 2017

bdeluca commented Jun 14, 2017

neelaruban commented Aug 30, 2017 •

edited

Loading

BSWANG commented Dec 1, 2017 •

edited

Loading

louisburton commented Apr 12, 2023

port already allocated #1790

port already allocated #1790

Comments

saiwl commented Jun 1, 2017 • edited Loading

saiwl commented Jun 9, 2017

fcrisciani commented Jun 12, 2017

fcrisciani commented Jun 12, 2017

saiwl commented Jun 13, 2017 • edited Loading

bdeluca commented Jun 13, 2017

fcrisciani commented Jun 13, 2017

bdeluca commented Jun 13, 2017

fcrisciani commented Jun 13, 2017

bdeluca commented Jun 13, 2017 via email

bdeluca commented Jun 13, 2017

bdeluca commented Jun 14, 2017

bdeluca commented Jun 14, 2017

neelaruban commented Aug 30, 2017 • edited Loading

BSWANG commented Dec 1, 2017 • edited Loading

louisburton commented Apr 12, 2023

saiwl commented Jun 1, 2017 •

edited

Loading

saiwl commented Jun 13, 2017 •

edited

Loading

neelaruban commented Aug 30, 2017 •

edited

Loading

BSWANG commented Dec 1, 2017 •

edited

Loading