Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP8266WebServer - Drop inactive connection when another is waiting to improve page load time #8216

Merged
merged 3 commits into from
Oct 27, 2021

Conversation

aWZHY0yQH81uOYvH
Copy link
Contributor

When loading pages with Safari from an HTTP ESP8266WebServer, Safari appears to occasionally open up two TCP connections and sends the GET request over the second one (found using Wireshark). In its current state, the library waits for data on the first connection, and, since there isn't any data, it sits there until it times out after 5 seconds (HTTP_MAX_DATA_WAIT). After the timeout, it moves on to the second connection which has the GET request waiting then calls the configured callback functions to serve the page content. I don't know why Safari does this, but it causes pages to take 5 seconds to load when they should be almost instant.

My fix is to simply check for more TCP connections while the server is waiting for more data from the one it's currently working with (connection is in the HC_WAIT_READ state). If there is another connection with available data, drop the current one and move on to the new one. This fixes the problem with Safari loading pages slowly as it gets rid of the seemingly pointless first connection once it receives the request on the second one. However, I am not particularly familiar with the other use cases of the library, so if this change in behavior is dumb, please close this PR.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 15, 2021

I was not able to reproduce this safari behavior with the HelloServer example.
The page loads immediately.

However I'm not telling it cannot happen like you say.

The logic of your PR is the following:

if "the current connection has currently nothing to say"
and "a new connection is incoming which is automatically closed right after verifying that it has something to say" (edited),
then drop the current connection (the currently mute one).

As a result, both connections are closed. I'm not sure that this is what you intend.

Indeed, server.available() returns a new freshly opened connection with a client trying to connect.
Doing this removes the new client from the waiting list: this is the only way to get a new client.
So server.available().available() calls client::available() on a temporary
(which is the new client received through server.available())
then closes it (because it's a temporary).
(the current client is also closed because keepCurrentClient is not set to true and that was your goal)

Did you intend to do what's below instead ?
Does it solve your safari isssue ?

diff --git a/libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h b/libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h
index 8c5cdf40..d570e877 100644
--- a/libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h
+++ b/libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h
@@ -334,8 +334,9 @@ void ESP8266WebServerTemplate<ServerType>::handleClient() {
           break;
         } // switch _parseRequest()
       } else {
-        // !_currentClient.available(): waiting for more data
-        if (millis() - _statusChange <= HTTP_MAX_DATA_WAIT) {
+        // !_currentClient.available():
+        // wait until timeout except if another connection is available
+        if (!_server.hasClient() && millis() - _statusChange <= HTTP_MAX_DATA_WAIT) {
           keepCurrentClient = true;
         }
         else

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

I see. I didn't realize that calling .available() would actually affect the state of the _server object. First of all, just so we're on the same page, let me share the code I've been using to test this:

#define DEBUG_ESP_HTTP_SERVER
#include <ESP8266WiFi.h>
#include <ESP8266WebServer.h>

const char *ssid = "xxx";
const char *wifiPassword = "xxx";

ESP8266WebServer HTTPserver(80);

void setup() {
  Serial.begin(115200);

  // Initialize WiFi
  Serial.print("Connecting to WiFi");
  WiFi.mode(WIFI_STA);
  WiFi.begin(ssid, wifiPassword);

  unsigned long startConnect = millis();
  while(WiFi.status() != WL_CONNECTED && millis()-startConnect < 30000) {
    delay(1000);
    Serial.print(".");
  }
  if(WiFi.status() != WL_CONNECTED) {
    Serial.println("\nWiFi connection failed! Rebooting...");
    delay(1000);
    ESP.restart();
  }

  Serial.println(" success");
  Serial.print("\tConnected to ");
  Serial.println(ssid);
  Serial.print("\tIP address: ");
  Serial.println(WiFi.localIP());

  HTTPserver.on("/", page);
  HTTPserver.begin();
}

void loop() {
  HTTPserver.handleClient();
}

void page() {
  HTTPserver.send(200, "text/plain", "Time: "+String(millis()));
}

This Wireshark capture of the unmodified library and the double Safari connections (ESP is 192.168.1.10, computer is 192.168.1.2, the HTTP GET is sent over the 53254 connection):
5 seconds
And the serial monitor output with DEBUG_ESP_HTTP_SERVER defined:

New client
http-server loop: conn=1 avail=0 status=wait-read
[repeated for 5 seconds]
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing after read timeout
Drop client
New client
http-server loop: conn=1 avail=354 status=wait-read
request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 192.168.1.10
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15
headerName: Accept-Language
headerValue: en-us
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive
args: 
args count: 0
args: 
args count: 0
Request: /
Arguments: 
final list of key/value pairs:
http-server loop: conn=1 avail=0 status=wait-close
[repeated for 2 seconds]
http-server loop: conn=1 avail=0 status=wait-close
Drop client

Like I tried to say in my first comment, this doesn't seem to happen every time, so these captures are of course only the ones that show the double connection issue. Here's what this becomes after my proposed modification.
close all connections

New client
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing since there's another connection to read from
Drop client
New client
http-server loop: conn=1 avail=0 status=wait-read
request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 192.168.1.10
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15
headerName: Accept-Language
headerValue: en-us
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive
args: 
args count: 0
args: 
args count: 0
Request: /
Arguments: 
final list of key/value pairs:
http-server loop: conn=1 avail=0 status=wait-close
[repeated for 2 seconds]
http-server loop: conn=1 avail=0 status=wait-close
Drop client

From the Wireshark capture, it's clear that the act of calling _server.available() and letting the returned client object go out of scope is causing both the 53351 and 53352 connections to be closed, just like you say. Safari then opens another single connection to re-send the GET request and then everything works. I had assumed .available() wouldn't do anything to the server's state (which seems to be implied here/here in the docs as well unless I'm reading it wrong), but the client destructor does appear to call the TCP close code. I had assumed everything was working and the second connection wasn't closed since DEBUG_ESP_HTTP_SERVER doesn't cause that connection close to be printed and I didn't verify it with Wireshark.

Your suggestion of using _server.hasClient() does what I intended by dropping just the first connection:
proposedfix

New client
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing since there's another connection to read from
Drop client
New client
http-server loop: conn=1 avail=0 status=wait-read
request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 192.168.1.10
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15
headerName: Accept-Language
headerValue: en-us
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive
args: 
args count: 0
args: 
args count: 0
Request: /
Arguments: 
final list of key/value pairs:
http-server loop: conn=1 avail=0 status=wait-close
[repeated for 2 seconds]
http-server loop: conn=1 avail=0 status=wait-close
Drop client

However, I was hoping to be able to verify that there was actually something available for reading from the new connection before dropping the current one (the .available().available()). I could imagine this might cause problems with multiple clients accessing the ESP (as mentioned by @paulocsanz whose comment seems to have disappeared) or at the very least causing lots of unnecessarily dropped and retried connections from browsers that use multiple connections to load multiple resources at once. It seems like the most ideal solution (short of handling multiple connections at once) would be to drop the connection and move on to the new one if all of the following conditions are met:

  • The current connection is waiting for more data
  • There is a new connection waiting with available data
  • A short timeout (20ms?) has passed to hopefully allow for slow clients/network to deliver any remaining data going into the current connection

Or, equivalently, HTTP_MAX_DATA_WAIT becomes much shorter when there's a new connection with data available. It seems like the "with data available" requirement would be a bit of a mess to implement with the way the library is currently written since you'd have to store the client object returned from .available() regardless of whether the conditions to switch to it are met or not. Perhaps if this problem is limited to Safari or just my computer being weird, then it's not worth pursuing further.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 15, 2021

However, I was hoping to be able to verify that there was actually something available for reading from the new connection before dropping the current one (the .available().available()).

It is not currently implemented but may be possible. I haven't tested if it can indeed return a positive number but may help with the above patch if it does.

diff --git a/libraries/ESP8266WiFi/src/WiFiServer.cpp b/libraries/ESP8266WiFi/src/WiFiServer.cpp
index 644f8b45..72f89d7b 100644
--- a/libraries/ESP8266WiFi/src/WiFiServer.cpp
+++ b/libraries/ESP8266WiFi/src/WiFiServer.cpp
@@ -109,6 +109,10 @@ bool WiFiServer::hasClient() {
     return false;
 }
 
+int WiFiServer::hasClientData() {
+    return _unclaimed? _unclaimed->getSize(): -1;
+}
+
 WiFiClient WiFiServer::available(byte* status) {
     (void) status;
     if (_unclaimed) {
diff --git a/libraries/ESP8266WiFi/src/WiFiServer.h b/libraries/ESP8266WiFi/src/WiFiServer.h
index f0619383..e81efb89 100644
--- a/libraries/ESP8266WiFi/src/WiFiServer.h
+++ b/libraries/ESP8266WiFi/src/WiFiServer.h
@@ -82,6 +82,11 @@ public:
   virtual ~WiFiServer() {}
   WiFiClient available(uint8_t* status = NULL);
   bool hasClient();
+  // hasClientData():
+  // -1: no client
+  //  0: client but no data
+  // >0: awaiting client with incoming data
+  int hasClientData();
   void begin();
   void begin(uint16_t port);
   void begin(uint16_t port, uint8_t backlog);

It seems like the most ideal solution (short of handling multiple connections at once) would be to drop the connection and move on to the new one if all of the following conditions are met:

  • The current connection is waiting for more data
  • There is a new connection waiting with available data
  • A short timeout (20ms?) has passed to hopefully allow for slow clients/network to deliver any remaining data going into the current connection

I agree that the current default timeout set to 5 second might be a bit large.
Anyway everything in our power has been done to always avoid this 5sec delay of doing nothing without needing an async webserver.
You may have run into a new case that should be addressed.

It seems like the "with data available" requirement would be a bit of a mess to implement with the way the library is currently written since you'd have to store the client object returned from .available() regardless of whether the conditions to switch to it are met or not.

This could be the start of a neverending optimization process: Should we look only the next one if the current one is empty ?
Things should be kept simple. The hasClientSize() is simple enough to help the other above patch (the connection without further data can be closed in favour of the one with data, whichever it is). However if Safari opens 10 connections and writes the effective request in the last one, then we should write a specific webserver for it (or not).

By the way, AsyncWebserver might be worth a try (I'm not saying that we should not fix our webserver). One still needs to remember that memory is precious and if we can make our not-async-webserver responsive (ie, without useless timeout) it should be sufficient regarding the esp8266's capabilities.

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

aWZHY0yQH81uOYvH commented Jul 17, 2021

Your hasClientData() function works well and does return a positive number.

hasClientData

New client
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing since there's another connection to read from
Drop client
New client
http-server loop: conn=1 avail=354 status=wait-read
request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 192.168.1.10
headerName: Upgrade-Insecure-Requests
headerValue: 1
headerName: Accept
headerValue: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
headerName: User-Agent
headerValue: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15
headerName: Accept-Language
headerValue: en-us
headerName: Accept-Encoding
headerValue: gzip, deflate
headerName: Connection
headerValue: keep-alive
args: 
args count: 0
args: 
args count: 0
Request: /
Arguments: 
final list of key/value pairs:
http-server loop: conn=1 avail=0 status=wait-close
[repeats for 2 seconds]
http-server loop: conn=1 avail=0 status=wait-close
Drop client

I replaced my old commit with a new one that includes the new server function and a new timeout constant for this connection dropping behavior that I've set to 30ms. I agree that this solution is a bit weird since there could conceivably be any number of mute connections and we don't know why Safari is making this one in the first place. But if you think this specific case is worth addressing, this new commit does fix it.

By the way, AsyncWebserver might be worth a try...

I did look at it for the project I'm working on, but I wanted HTTPS which AsyncWebServer doesn't really seem to support. Your library looks like the best with HTTPS for the ESP8266. In fact, messing with HTTPS was what originally led me on to this problem. The ESP takes about 2-3 seconds to respond with its certificate, which I haven't really looked into yet (seems to be faster with the faster 160MHz CPU speed setting, so the crypto might just be computationally expensive which is unavoidable I guess). While I was trying to figure out what was causing that delay when I ran into this other 5-second problem.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 17, 2021

It looks nice !
However after thinking of it three times, let's imagine tomorrow safari opens three connections and asks for something only in the third one.
Then hasClientData() will be 0 (it's a linked list) and we'll have to wait the for the regular timeout (5s by default).
Maybe hasClientData() is not a good idea in this case because it's not general.
If we use hasClient() instead in this example, the second connection will be opened then almost closed after the small timeout, then the third is read. So it's more general with a small default delay regarding faulty / not compliant clients. In this case HTTP_MAX_DATA_AVAILABLE_WAIT would have to be renamed to HTTP_MAX_WAIT_WITH_NEW_AVAILABLE_CLIENT.
What do you think ?

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

Yeah, that seems like it would be a better solution than just checking the first connection in the list. I actually had another idea for a new version of the hasClientData() function that would be more useful. Since you mentioned that _unclaimed is a linked list, I made it so it just checks all the connected clients and returns the amount of data available from the first one that has any data.

diff --git a/libraries/ESP8266WiFi/src/WiFiServer.cpp b/libraries/ESP8266WiFi/src/WiFiServer.cpp
index 72f89d7b..360de616 100644
--- a/libraries/ESP8266WiFi/src/WiFiServer.cpp
+++ b/libraries/ESP8266WiFi/src/WiFiServer.cpp
@@ -109,8 +109,15 @@ bool WiFiServer::hasClient() {
     return false;
 }
 
-int WiFiServer::hasClientData() {
-    return _unclaimed? _unclaimed->getSize(): -1;
+size_t WiFiServer::hasClientData() {
+    ClientContext *next = _unclaimed;
+    while (next) {
+        size_t s = next->getSize();
+        // return the amount of data available from the first connection that has any
+        if (s) return s;
+        next = next->next();
+    }
+    return 0;
 }
 
 WiFiClient WiFiServer::available(byte* status) {
diff --git a/libraries/ESP8266WiFi/src/WiFiServer.h b/libraries/ESP8266WiFi/src/WiFiServer.h
index e81efb89..9efb5cca 100644
--- a/libraries/ESP8266WiFi/src/WiFiServer.h
+++ b/libraries/ESP8266WiFi/src/WiFiServer.h
@@ -83,10 +83,9 @@ public:
   WiFiClient available(uint8_t* status = NULL);
   bool hasClient();
   // hasClientData():
-  // -1: no client
-  //  0: client but no data
-  // >0: awaiting client with incoming data
-  int hasClientData();
+  // returns the amount of data available from the first client
+  // or 0 if there is none
+  size_t hasClientData();
   void begin();
   void begin(uint16_t port);
   void begin(uint16_t port, uint8_t backlog);

This addresses the case in which we have 3 connections and the request is sent in the last one:

New client
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing since there's another connection to read from
Drop client
New client
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
http-server loop: conn=1 avail=0 status=wait-read
webserver: closing since there's another connection to read from
Drop client
New client
http-server loop: conn=1 avail=37 status=wait-read
request: GET / HTTP/1.1
method: GET url: / search:  keepAlive=: 1
headerName: Host
headerValue: 92.168.1.10
args: 
args count: 0
args: 
args count: 0
Request: /
Arguments: 
final list of key/value pairs:
http-server loop: conn=1 avail=0 status=wait-close
[repeats for 2 seconds]
http-server loop: conn=1 avail=0 status=wait-close
Drop client

However, it seems like somewhere there's a limit of 6 connections maximum, so this partially fails if you open 7 or more connections and send the request on the last one. It gets stuck for 5 seconds on the first one before it closes it (meanwhile the server gets angry and spams it with TCP retransmission packets), accepts the last connection and sees that it has data, and quickly closes the ones in between to get to it. Your proposed solution doesn't have this problem, but it would skip any open connections even if there isn't any data to read. I guess the question comes down to which scenario is more likely: more than 6 empty connections before one with data, or multiple slow clients that take more than 30ms to send their data. Given that I seem to be the first one complaining about the mute connections, the latter is probably more likely?... Which solution do you prefer?

I made this little python script to test empty connections if you want to play with it. python3 UselessConnections.py 192.168.1.10 3 2 to open 3 connections with the request it the last one, replace the last argument with -1 to not send any request, and ctrl-c to close them all.

import sys
import socket
import time
import signal

if len(sys.argv) != 4:
	print("Usage: "+sys.argv[0]+" (ip) (number of connections) (connection to send request on)")
	sys.exit(1)

connections = []

def closeConnections(sig = None, frame = None):
	for connection in connections:
		connection.close()
	print("Closed connections")
	sys.exit(0)

signal.signal(signal.SIGINT, closeConnections)

for x in range(int(sys.argv[2])):
	client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	client.connect((sys.argv[1], 80))
	connections.append(client)
print("Connections established")

if int(sys.argv[3]) >= 0:
	connections[int(sys.argv[3])].send(("GET / HTTP/1.1\r\nHost:%s\r\n\r\n" % sys.argv[1]).encode())
	print("Request sent on connection "+sys.argv[3])
else:
	print("Not sending request")

time.sleep(30)

closeConnections()

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 19, 2021

_unclaimed is a linked list but is limited in its size by an anti-DOS mechanism called "TCP backlog".
Because any incoming connection makes an unconditional memory allocation, and due to limited RAM on esp, this mechanism prevents answers (thus memory allocation) to any further client connection attempt until the current number of active connections decreases.

Given that fact, as you mentioned it, we can imagine a step further an angry safari making 10 connections with only the last one containing a non-void request. In that case, we'd have a deadlock for 5secs times (10 - backlog-size).

Your proposed solution doesn't have this problem, but it would skip any open connections even if there isn't any data to read

That was initially your proposal and yes: No data to read, no time to lose because this webserver can serve only one client at time. Empty connections would be skipped one by one. I don't know if 30ms is too long or short, but usually initial data are sent right away after opened connections.

I'll mark this pull request for release 3.1 because the 15? vs 30 vs 50? ms has to be tested a bit more (and 3.0.2 is a bugfix release only).

@d-a-v d-a-v added this to the 3.1 milestone Jul 19, 2021
@mcspr
Copy link
Collaborator

mcspr commented Jul 19, 2021

_unclaimed is a linked list but is limited in its size by an anti-DOS mechanism called "TCP backlog".

iirc, it is specific to the lwip2-feat variant, and is not enabled by default

#define TCP_LISTEN_BACKLOG LWIP_FEATURES // 0

#if TCP_LISTEN_BACKLOG
#define tcp_backlog_set(pcb, new_backlog) do { \
LWIP_ASSERT("pcb->state == LISTEN (called for wrong pcb?)", (pcb)->state == LISTEN); \
((struct tcp_pcb_listen *)(pcb))->backlog = ((new_backlog) ? (new_backlog) : 1); } while(0)
void tcp_backlog_delayed(struct tcp_pcb* pcb);
void tcp_backlog_accepted(struct tcp_pcb* pcb);
#else /* TCP_LISTEN_BACKLOG */
#define tcp_backlog_set(pcb, new_backlog)
#define tcp_backlog_delayed(pcb)
#define tcp_backlog_accepted(pcb)
#endif /* TCP_LISTEN_BACKLOG */

(happened to notice this when re-implementing basic server and trying to use lwip_backlog funcs)

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 19, 2021

iirc, it is specific to the lwip2-feat variant, and is not enabled by default

To be more precise, it is not enabled by default with platformIO (but it should, TCP is more stable when FEATURES are enabled) but it is enabled when using Arduino IDE / arduino-cli.

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

It's good to know that there's some kind of anti-DOS stuff built-in. With the hasClientData() solution, opening too many connections and causing the 5 second wait on each would bypass that protection and would be an easy way to cause lockup. However, I feel like even having the 30ms or whatever timeout with the hasClient() solution could also easily be overwhelmed by any normal computer. It's similar to the Slowloris DOS attack, for which mitigations would have to be much more complicated. I don't know if it's really possible to fully mitigate against "evil" clients, especially if they're intentionally malicious, given the CPU and RAM constraints and the intent to keep things simple.

Perhaps a compromise would be to use hasClientData() and OR it with a function that returns whether there are any more unacknowledged TCP connections waiting. This would treat any extra connections that don't fit in the buffer as if they do have data, resulting in quicker connection dropping until it can actually see what's going on in those new connections. I don't see any obvious function to do this, though.

…nother has data

Safari sometimes opens two connections when loading a page and only
sends a request over the second one, resulting in a 5 second wait
(HTTP_MAX_DATA_WAIT) before the request is processed. This commit
drops the current connection after 30ms (HTTP_MAX_DATA_AVAILABLE_WAIT)
when there is a new connection with data available or the buffer of
pending TCP clients is full (currently 5).
@aWZHY0yQH81uOYvH
Copy link
Contributor Author

I figured out how to access the number of connections waiting and added another helper function to WiFiServer (hasMaxPendingClients()) so that this information is available to the web server. Now the server will drop a connection if there is data available from one of the pending connections or if the buffer of pending connections is full (there are 5 connections) after 30ms. This fixes the very long wait for 10 connections. Here's another Wireshark and serial monitor dump (this time with timestamps enabled and the repeated stuff removed).

hasMaxPendingClients

20:17:21.766 -> New client
20:17:21.766 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.766 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.766 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:21.799 -> webserver: closing since there's another connection to read from
20:17:21.832 -> Drop client
20:17:21.832 -> New client
20:17:21.832 -> http-server loop: conn=1 avail=0 status=wait-read
...
20:17:23.019 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:23.019 -> webserver: closing since there's another connection to read from
20:17:23.019 -> Drop client
20:17:23.019 -> New client
20:17:23.019 -> http-server loop: conn=1 avail=0 status=wait-read
...
20:17:24.118 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:24.151 -> webserver: closing since there's another connection to read from
20:17:24.151 -> Drop client
20:17:24.151 -> New client
20:17:24.151 -> http-server loop: conn=1 avail=0 status=wait-read
...
20:17:25.256 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:25.256 -> webserver: closing since there's another connection to read from
20:17:25.256 -> Drop client
20:17:25.256 -> New client
20:17:25.256 -> http-server loop: conn=1 avail=0 status=wait-read
...
20:17:26.373 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> webserver: closing since there's another connection to read from
20:17:26.407 -> Drop client
20:17:26.407 -> New client
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.407 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> webserver: closing since there's another connection to read from
20:17:26.440 -> Drop client
20:17:26.440 -> New client
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.440 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> webserver: closing since there's another connection to read from
20:17:26.473 -> Drop client
20:17:26.473 -> New client
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.473 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> webserver: closing since there's another connection to read from
20:17:26.507 -> Drop client
20:17:26.507 -> New client
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.507 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> http-server loop: conn=1 avail=0 status=wait-read
20:17:26.540 -> webserver: closing since there's another connection to read from
20:17:26.540 -> Drop client
20:17:26.540 -> New client
20:17:26.573 -> http-server loop: conn=1 avail=37 status=wait-read
20:17:26.573 -> request: GET / HTTP/1.1
20:17:26.573 -> method: GET url: / search:  keepAlive=: 1
20:17:26.573 -> headerName: Host
20:17:26.573 -> headerValue: 92.168.1.10
20:17:26.573 -> args: 
20:17:26.573 -> args count: 0
20:17:26.573 -> args: 
20:17:26.573 -> args count: 0
20:17:26.573 -> Request: /
20:17:26.573 -> Arguments: 
20:17:26.573 -> final list of key/value pairs:
20:17:26.573 -> http-server loop: conn=1 avail=0 status=wait-close
...
20:17:28.584 -> http-server loop: conn=1 avail=0 status=wait-close
20:17:28.584 -> Drop client

The first 6 connections get accepted, the thing sees that the buffer is full, and it drops the first one. A second later, the python program retransmits the 7th connection (which wasn't ACKed since the buffer was full), which, now that the buffer is down to 4 connections, gets accepted. The python program then tries the 8th connection, but this doesn't geet ACKed again since the buffer is now up to 5. This repeats until it gets the last connection with the request in it, causing it to quickly skip the other 4 connections in the buffer and finally get to that last one.

This still doesn't handle all 10 connections instantly, but now the speed limitation is the time it takes the client to send the retransmissions instead of the 5 second read timeout. The hasClient() solution is a bit faster since it only requires one TCP retransmission with 10 connections (the connection buffer is empty by the time it gets the retransmission so it's able to accept all the remaining connections). However, this difference is more down to the python script and the sockets library. The hasClientData() and hasMaxPendingClients() solution would be faster if the retransmission speed was higher or the connections were asynchronously sent. I think this handles this edge case well enough to be considered fixed.

@alambin
Copy link

alambin commented Oct 20, 2021

This issue is also reproducible in case of Chrome and ESP32.
I tried the first solution - looks like it works.
It would be nice to see this patch for ESP32 as well.

@NRollo
Copy link

NRollo commented Oct 23, 2021

Am I right to conclude that @aWZHY0yQH81uOYvH "fixes", or any fixes for this issue, isn't released in 3.0.2?
I also have performance issues, from time to time, with Chrome requests to an ESP8266WebServer.

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

@NRollo Yes, I just checked 3.0.2 still exhibits this issue with multiple connections. Applying my changes from this PR to the 3.0.2 code fixes the problem.

@NRollo
Copy link

NRollo commented Oct 25, 2021

@aWZHY0yQH81uOYvH Thanks for the update!
I am not that familiar with the functionality/process of the GitHub, where is you proposed code updates, to be more precise?

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

@NRollo The changes are under the files changed tab on this page, and the whole repository with my changes included is available on my fork.

@mcspr
Copy link
Collaborator

mcspr commented Oct 25, 2021

@d-a-v any objections?

So, if we are dealing with timing, do we also want to expose timeout values as member variables and can be overridden by the user, not something hardcoded? As tests above only checked out lan, not something with worse latency conditions

Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aWZHY0yQH81uOYvH
Thank you, good job !

@mcspr

So, if we are dealing with timing, do we also want to expose timeout values as member variables and can be overridden by the user, not something hardcoded? As tests above only checked out lan, not something with worse latency conditions

I'm approving it for merging now then we can add an accessor in a further PR if that's OK with you

@mcspr
Copy link
Collaborator

mcspr commented Oct 25, 2021

LD  /home/runner/work/Arduino/Arduino/tests/host/bin//home/runner/work/Arduino/Arduino/libraries/ESP8266WebServer/examples/HelloServer/HelloServer
/usr/bin/ld: /home/runner/work/Arduino/Arduino/tests/host/bin//home/runner/work/Arduino/Arduino/libraries/ESP8266WebServer/examples/HelloServer/HelloServer.ino.cpp.o: in function `esp8266webserver::ESP8266WebServerTemplate<WiFiServer>::handleClient()':
/home/runner/work/Arduino/Arduino/tests/host/../../libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h:349: undefined reference to `WiFiServer::hasClientData()'
/usr/bin/ld: /home/runner/work/Arduino/Arduino/tests/host/../../libraries/ESP8266WebServer/src/ESP8266WebServer-impl.h:349: undefined reference to `WiFiServer::hasMaxPendingClients()'
collect2: error: ld returned 1 exit status

From the CI. Host tests have a custom WiFiServer class with sockets, hence the error
https://github.com/esp8266/Arduino/blob/master/tests/host/common/MockWiFiServer.cpp
https://github.com/esp8266/Arduino/blob/master/tests/host/common/MockWiFiServerSocket.cpp

And it would probably mean another test-case that has two clients, one stalling and another connecting?
https://github.com/esp8266/Arduino/tree/master/tests/device/test_sw_WiFiServer

@NRollo
Copy link

NRollo commented Oct 26, 2021

@aWZHY0yQH81uOYvH Thanks for pointing me in the right direction!
It seems something is going to happen now in the master branch, maybe I should wait for that?

d-a-v added a commit to d-a-v/Arduino that referenced this pull request Oct 27, 2021
@d-a-v
Copy link
Collaborator

d-a-v commented Oct 27, 2021

@aWZHY0yQH81uOYvH If you agree with it, accepting the PR I made on your repository should make CI to pass here.

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

@d-a-v Mock functions make sense, it's been merged.

@d-a-v d-a-v merged commit f8de3fb into esp8266:master Oct 27, 2021
@mcspr
Copy link
Collaborator

mcspr commented Oct 27, 2021

@d-a-v tnx!
is there any way to have a synthetic test to preserve the behaviour though? if mock does not work, will device test with a scripted input?

@alambin
Copy link

alambin commented Oct 28, 2021

Will this patch be merged into ESP32 repo?

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 28, 2021

@alambin No, one'd have to ask them or the same PR be made there.
edit: esp32/arduino uses lwIP's socket API while here the raw API is used. Changes will not be the same there.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 28, 2021

@mcspr Safari?'s requests would have to be reproduced reliably in order to achieve a host test.
If @aWZHY0yQH81uOYvH is able to reproduce with a sketch compiled on host using a safari? client, then it is a good idea to set up a host test.

@aWZHY0yQH81uOYvH could you try compilation on host with your browser ?

$ cd tests/host
$ make
$ #<copy-paste of one of the examples>, or
$ make path/to/sketch  # builds path/to/sketch/sketch.ino into `tests/host/bin[32]/...`
$ ./bin/.../path/to/sketch/sketch

@aWZHY0yQH81uOYvH
Copy link
Contributor Author

aWZHY0yQH81uOYvH commented Oct 31, 2021

@d-a-v The mock server does quickly respond to the weird Safari requests, but the behavior is quite different from on the real hardware due to the lack of a connection buffer (and therefore hasClientData() always returning 0). Basically, the biggest difference is that it will always close the current connection if a new one appears regardless of whether or not it has data, which on the real hardware would only happen if there were more than 5 waiting connections (any fewer and it'd wait 5 seconds for data on each). To test this properly, the connection buffer would have to be implemented in the mock code which probably isn't worth the effort.

If you want to reliably simulate the Safari behavior, run the script I left here like python3 emptyconnections.py localhost 2 1 (edit the port to 9080 in the source code).

Also, I couldn't get the host tests to compile on Mac with clang or gcc 11; it had a linker error complaining about not being able to find a symbol (__Z11serialEventv), worked fine on Ubuntu with gcc 9 though.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 31, 2021

Thanks @aWZHY0yQH81uOYvH . Then a host device test shall be added as @mcspr suggest.
edit #8349 is updated

Regarding host and g++-11 I have no issue with ubuntu. That said void serialEvent() is indeed a weak function with no default implementation. So adding one probably helps in MockUART.cpp if it matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants