Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP LWIP network stack cannot handle 3 binds correctly. #8363

Closed
bill88t opened this issue Sep 4, 2023 · 11 comments
Closed

ESP LWIP network stack cannot handle 3 binds correctly. #8363

bill88t opened this issue Sep 4, 2023 · 11 comments
Labels
bug espressif applies to multiple Espressif chips network
Milestone

Comments

@bill88t
Copy link

bill88t commented Sep 4, 2023

CircuitPython version

Adafruit CircuitPython 9.0.0-alpha.1-25-g000d22f25 on 2023-08-30; VCC-GND YD-ESP32-S3 (N16R8) with ESP32S3

Code/REPL

import wifi
from socketpool import SocketPool
from adafruit_requests import Session

pool = SocketPool(wifi.radio)
_socket = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket2 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

_socket.bind(("0.0.0.0", 20))
_socket.listen(1)
_socket2.bind(("0.0.0.0", 21))
_socket2.listen(1)
a = _socket.accept()
b = _socket2.accept()
a.close()
b.close()
print("ok")

Behavior

If we were to just telnet into both, the program would print "ok" and close both connections.
However, we can only connect on the first.
The second is stuck at SYN_WAIT.

Description

No response

Additional information

This is needed to properly implement PASV for ftp.
ACTIVE with a bind & a connection, works just fine.

Reproducible on S2 too.

@bill88t bill88t added the bug label Sep 4, 2023
@anecdata
Copy link
Member

anecdata commented Sep 5, 2023

Had to tweak the close(). Then it works on raspberrypi, but espressif gets stuck in _socket2.accept() (client times out on connect).

server code.py
import wifi
from socketpool import SocketPool

pool = SocketPool(wifi.radio)
_socket = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket2 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

_socket.bind(("0.0.0.0", 20))
_socket.listen(1)
_socket2.bind(("0.0.0.0", 21))
_socket2.listen(1)
a = _socket.accept()
b = _socket2.accept()
a[0].close()
b[0].close()
print("ok")

@anecdata
Copy link
Member

anecdata commented Sep 5, 2023

works in asyncio on both platforms:

server code.py
import asyncio
import wifi
import socketpool

PORT1 = 20
PORT2 = 21

async def tcpserver(PORT):
    s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
    s.bind(("", PORT))
    s.listen(1)
    s.settimeout(0)
    while True:
        try:
            conn, addr = s.accept()
            print(f"{PORT} OK {addr}")
            conn.close()
        except OSError:  # EAGAIN
            pass
        await asyncio.sleep(0)

pool = socketpool.SocketPool(wifi.radio)

async def main():
    t1 = asyncio.create_task(tcpserver(PORT1))
    t2 = asyncio.create_task(tcpserver(PORT2))
    await asyncio.gather(t1, t2)

asyncio.run(main())
CPython client code for both cases above
#!/usr/bin/env python3
import socket
import time
import random

# edit host and port to match server
HOST = "192.168.6.198"
PORTS = (20, 21)
TIMEOUT = 5
INTERVAL = 1

while True:
    PORT = random.choice(PORTS)  # just for fun
    print("Create TCP Client Socket")
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(TIMEOUT)
        print("Connecting")
        s.connect((HOST, PORT))
        size = s.send(b'Hello, world')
        print(f"Sent {size} bytes to {HOST}:{PORT}")
    time.sleep(INTERVAL)

edit: I guess b/c only one is connected at a time, may be a useful workaround

@bill88t
Copy link
Author

bill88t commented Sep 5, 2023

watch -n0.2 'netstat | grep "board-ip-here"' to monitor all connections and their status.
The second one, as stated above is stuck at SYN_WAIT.
It could just be a oopsie, sending the SYN response to the wrong connection for all I know.

Regarding asyncio, I drafted this:

import asyncio
import wifi
import socketpool
from sys import stdout

conn1 = None
conn2 = None

async def tcpserver1():
    s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
    s.bind(("0.0.0.0", 20))
    s.listen(1)
    s.settimeout(10)
    conn1, addr = s.accept()
    print(f"{conn1} OK {addr}")

async def tcpserver2():
    s = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
    s.bind(("0.0.0.0", 21))
    s.listen(1)
    s.settimeout(10)
    conn2, addr = s.accept()
    print(f"{conn2} OK {addr}")

async def tcpserver1close():
    conn1.close()
    print("Closed 1")

async def tcpserver2close():
    conn2.close()
    print("Closed 2")

pool = socketpool.SocketPool(wifi.radio)

async def main():
    t1 = asyncio.create_task(tcpserver1())
    t2 = asyncio.create_task(tcpserver2())
    t3 = asyncio.create_task(tcpserver1close())
    t4 = asyncio.create_task(tcpserver2close())
    await asyncio.gather(t1, t2)
    await asyncio.gather(t3, t4)

asyncio.run(main())

image

Which as you see, also doesn't work.

Something to note:

import wifi
from socketpool import SocketPool
from adafruit_requests import Session

pool = SocketPool(wifi.radio)
_socket = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket2 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

_socket.bind(("0.0.0.0", 20))
_socket.listen(1)
_socket2.bind(("0.0.0.0", 20))
_socket2.listen(1)
print("Accepting 1")
a = _socket.accept()
print("Accepting 2")
b = _socket2.accept()
print("Accepted 2")
_socket.close()
print("ok")

Is also a nono.
But:

import wifi
from socketpool import SocketPool
from adafruit_requests import Session

pool = SocketPool(wifi.radio)
_socket = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket2 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)

_socket.bind(("0.0.0.0", 20))
_socket.listen(2)
print("Accepting 1")
a = _socket.accept()
print("Accepting 2")
b = _socket.accept()
print("Accepted 2")
_socket.close()
print("ok")

Works just fine..

Auto-reload is off.
code.py output:
Accepting 1
Accepting 2
Accepted 2
ok

Code done running.

And dolphin w/PASV doesn't work with it, even if the cli does.
So it can't be used as a workaround.

PASSIVE (PASV)

            This command requests the server-DTP to "listen" on a data
            port (which is not its default data port) and to wait for a
            connection rather than initiate one upon receipt of a
            transfer command.  The response to this command includes the
            host and port address this server is listening on.

Passing everythin over the control port isn't something ftp is designed for it so seems.

@anecdata
Copy link
Member

anecdata commented Sep 5, 2023

There does seem to be an espressif bug when attempting simultaneous TCP connections (2 accepts in a row on different ports, without closing either) - your 1st example above and the original example.

I fully expect sequential (close before next connect) connections (to the same or different ports) to work (your 3rd example, and my asyncio example). I wouldn't expect 2 servers on the same port to work (your 2nd example).

But PASV doesn't work with sequential connection to the control port, then the data port?

@dhalbert
Copy link
Collaborator

dhalbert commented Sep 5, 2023

There does seem to be an espressif bug when attempting simultaneous TCP connections (2 accepts in a row on different ports, without closing either)

I think it would be good to test this with, say, MicroPython on some ESP32xx board to see if it is our issue or is ESP-IDF. Also maybe a quick search of the ESP-iDF issues.

@bill88t
Copy link
Author

bill88t commented Sep 5, 2023

But PASV doesn't work with sequential connection to the control port, then the data port?

PASV tl;dr workflow explanation:

  1. Open port 21. This is the control connection.
  2. User connects on 21.
  3. User authenticates.
  4. User sends the PASV command to start the data socket.
  5. Server decides on a port to use, and opens it up (.listen()).
  6. Server sends the RFC defined formatted reply to the client over the control connection.
  7. Client immediately connects to the ip:port the server sent in step 6. This is the data connection.
  8. Client sends (over control connection) a command that needs a data connection, for example LIST.
  9. Server sends ok over control connection and then sends all the data over the data connection.
  10. Server closes the data connection and deinit's the socket, signaling the transaction is done.
  11. Client continues sending other commands, or PASV again.

This whole time, the control connection remains open and in use.
If we close it, the client aborts. I did try.

@bill88t
Copy link
Author

bill88t commented Sep 5, 2023

New discovery!

If the web workflow is disabled, you can do 2 binds.
So in reality you cannot do 3 binds.

import wifi
from socketpool import SocketPool
from sys import exit

try:
    wifi.radio.connect("Thinkpood", "REDACTED")
except:
    pass
if not wifi.radio.connected:
    print("No wifi")
    exit(0)

pool = SocketPool(wifi.radio)
_socket = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket.bind(("", 20))
_socket.listen(1)
_socket2 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket2.bind(("", 21))
_socket2.listen(1)

print("Accepting 1")
a = _socket.accept()
print("Accepting 2")
b = _socket2.accept()
print("Accepted 2")

_socket.close()
_socket2.close()
print("ok")

image

@bill88t bill88t changed the title ESP LWIP network stack cannot handle 2 binds correctly ESP LWIP network stack cannot handle 3 binds correctly. Sep 5, 2023
@bill88t
Copy link
Author

bill88t commented Sep 5, 2023

Taking the same code and adding:

_socket3 = pool.socket(pool.AF_INET, pool.SOCK_STREAM)
_socket3.bind(("", 22))

in it, is all that is required to trigger the bug.
.listen() does not affect it.

This gives me some good clues as to where to look.

@bill88t
Copy link
Author

bill88t commented Sep 5, 2023

I did plently of esp debugging.
I only however managed to down as far as lwip_accept in ports/espressif/common-hal/socketpool/Socket.c:272.
With esp_log we can see there, the 2nd/3rd socket not being accepted.
I tried going deeper, but the internal logging of lwip doesn't get printed no matter what I do.
Importing esp_log, is a mess I cannot figure out.

I will have to leave it you you guys from there on out.
(Perhaps C3 can prove itself not being a paperbrick and help with some jtag?)

@tannewt tannewt added network espressif applies to multiple Espressif chips labels Sep 5, 2023
@tannewt tannewt added this to the Long term milestone Sep 5, 2023
@bill88t
Copy link
Author

bill88t commented Sep 10, 2023

I went and put micropython 1.20 on one of my s2 boards, and this is not reproducible.
I opened 4 sockets and connected to them successfully.

@bill88t
Copy link
Author

bill88t commented Apr 14, 2024

This issue is fully resolved.

@bill88t bill88t closed this as completed Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug espressif applies to multiple Espressif chips network
Projects
None yet
Development

No branches or pull requests

4 participants