Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File loads only partially #20

Closed
marwijn opened this issue Aug 26, 2016 · 44 comments
Closed

File loads only partially #20

marwijn opened this issue Aug 26, 2016 · 44 comments

Comments

@marwijn
Copy link

marwijn commented Aug 26, 2016

I'm using the async webbrowser to load a file in the webbrowser. This works well when using chrome version 52 on a desktop PC. However using chrome (v52) on an android phone will only partly load the file and show a timeout in the logging. The problem seems to occur with any file over about 2k, but only on my telephone.

The logging that I've captured can be found below:

del if1
usl
mode : sta(5c:cf:7f:17:5d:98)
add if0
sta config unchangedf r0, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
reconnect
f 0, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
reconnect
f -180, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
STA: Failed!
reconnect
f r0, scandone
del if0
usl
mode : softAP(5e:cf:7f:17:5d:98)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100
[APConfig] local_ip: 192.168.4.1 gateway: 192.168.4.1 subnet: 255.255.255.0
[APConfig] DHCP IP start: 192.168.4.100
[APConfig] DHCP IP end: 192.168.4.200
[AP] softap config unchanged
OTA server at: esp8266-175d98.local:8266
SPIFFSImpl: allocating 512+180+1400=2092 bytes
SPIFFSImpl: mounting fs @100000, size=2fb000, block=2000, page=100
SPIFFSImpl: mount rc=0

wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
add 1
aid 1
station: 24:df:6a:53:34:ba join, AID = 1
wifi evt: 5
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
TIMEOUT: 5142, state: Established
SPIFFS_close: fd=1
SPIFFS_close: fd=1
wifi evt: 7
wifi evt: 7
wifi evt: 7

@me-no-dev
Copy link
Owner

you can do two things:
change this define to larger number or call this method on the client to change it.

@marwijn
Copy link
Author

marwijn commented Aug 26, 2016

The only thing that changes is the TIMEOUT above in the log. Changed from 5142 to 60241.
Below the log when the same site is loaded from chrome/PC. The log file above seems to be missing urd, urch, urn log lines what are these ?

del if1
usl
mode : sta(5c:cf:7f:17:5d:98)
add if0
sta config unchangedf r0, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
reconnect
f 0, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
reconnect
f -180, scandone
no ******* found, reconnect after 1s
wifi evt: 1
STA disconnect: 201
STA: Failed!
del if0
usl
mode : softAP(5e:cf:7f:17:5d:98)
add if1
dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1)
bcn 100
[APConfig] local_ip: 192.168.4.1 gateway: 192.168.4.1 subnet: 255.255.255.0
[APConfig] DHCP IP start: 192.168.4.100
[APConfig] DHCP IP end: 192.168.4.200
[AP] softap config unchanged
OTA server at: esp8266-175d98.local:8266
SPIFFSImpl: allocating 512+180+1400=2092 bytes
SPIFFSImpl: mounting fs @100000, size=2fb000, block=2000, page=100
SPIFFSImpl: mount rc=0
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
add 1
aid 1
station: 80:00:0b:bf:42:80 join, AID = 1
wifi evt: 5
:urn 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 78
:urd 15, 78, 13
:urch 78, 49
:urch 49, 78
:urd 15, 78, 13
:urch 78, 49
:urch 49, 78
:urd 15, 78, 13
:urch 78, 49
:urch 49, 145
:urch 145, 145
:urch 145, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 145
:urch 145, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 145
:urch 145, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
:urch 40, 40
:urd 11, 40, 13
:urd 4, 40, 25
:urd 5, 40, 30
SPIFFS_close: fd=1
SPIFFS_close: fd=1
SPIFFS_close: fd=2
SPIFFS_close: fd=2
SPIFFS_close: fd=1
SPIFFS_close: fd=1
SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2
SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2.gz
SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2/index.htm
SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2/index.htm.gz
NOT_FOUND: GET http://192.168.4.1/fonts/fontawesome-webfont.woff2
_HEADER[Connection]: keep-alive
_HEADER[Origin]: http://192.168.4.1
_HEADER[User-Agent]: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
_HEADER[Accept]: /
_HEADER[Referer]: http://192.168.4.1/font-awesome.min.css
_HEADER[Accept-Encoding]: gzip, deflate, sdch
_HEADER[Accept-Language]: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4
_GET[v]: 4.6.1
SPIFFS_close: fd=4
SPIFFS_close: fd=4
SPIFFS_close: fd=3
SPIFFS_close: fd=3
:urch 40, 145
SPIFFS_close: fd=1
SPIFFS_close: fd=1
wifi evt: 7
SPIFFS_close: fd=2
SPIFFS_close: fd=2
SPIFFS_close: fd=1
SPIFFS_close: fd=1
SPIFFSImpl::open: fd=-1 path=/debug openMode=0 accessMode=1 err=-10002
SPIFFSImpl::open: fd=-1 path=/debug.gz openMode=0 accessMode=1 err=-10002
SPIFFSImpl::open: fd=-1 path=/debug/index.htm openMode=0 accessMode=1 err=-10002
SPIFFSImpl::open: fd=-1 path=/debug/index.htm.gz openMode=0 accessMode=1 err=-10002
NOT_FOUND: GET http://192.168.4.1/debug
_HEADER[Connection]: Upgrade
_HEADER[Pragma]: no-cache
_HEADER[Cache-Control]: no-cache
_HEADER[Upgrade]: websocket
_HEADER[Origin]: http://192.168.4.1
_HEADER[Sec-WebSocket-Version]: 13
_HEADER[User-Agent]: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
_HEADER[Accept-Encoding]: gzip, deflate, sdch
_HEADER[Accept-Language]: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4
_HEADER[Sec-WebSocket-Key]: FiQPDjTfWtbZMdjlmoUlcA==
_HEADER[Sec-WebSocket-Extensions]: permessage-deflate; client_max_window_bits
ws[/ws][1] connect
ws[/ws][1] pong[0]:
chg_A4:0
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
wifi evt: 7
:urch 145, 145

@me-no-dev
Copy link
Owner

that TIMEOUT means that you did not get an ACK for the last packet that was sent.

@marwijn
Copy link
Author

marwijn commented Aug 26, 2016

Yes that is as far as I understand. As the phone seems to work with any other website, I wonder what is different with the ESP ASyncTcp/AsyncWebserver. The files do get served with the non async FsBrowser example. I'm not very familiar with TCP protocol, is an ack necessary for each package ? If no ack is giving shouldn't the esp retry sending the package(s) ?

@me-no-dev
Copy link
Owner

an ACK means that the other end got the packet. I will look through the code and see if I can do anything about it, but in general the server is waiting for that ack to know how much data was sent and what is the progress of the connection.

@marwijn
Copy link
Author

marwijn commented Aug 26, 2016

I suspect that the phone is unable to keep up with the data sent from the server.
Then drops the package.
Therefore does not send an ack.
And waits for retransmission from the ESP. (by TCP spec)

@me-no-dev
Copy link
Owner

I guess the ESP also gives up, but that does not get propagated up to my library.
By the way I just checked the code and the server sends as much as it can, as long as lwip reports that there is space for data. I guess lwip is holding to the data also and does not let me send anything

@me-no-dev
Copy link
Owner

interesting why you say it works with the other server... we use the same stack.

@marwijn
Copy link
Author

marwijn commented Aug 28, 2016

I did add a bit of extra logging in Webresponses.cpp around line 250 and further:

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest *request, size_t len, uint32_t time){
os_printf("AsyncAbstractResponse::_ack %d %d\n", len, time);
if(!_sourceValid()){

and

if(outLen)
os_printf("AsyncAbstractResponse::_write2 : ");
_writtenLength += request->client()->write((const char*)buf, outLen);
os_printf("%d\n", _writtenLength);

When using my PC everything works correct and I get the following logging :

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 2920 11
AsyncAbstractResponse::_write2 : 5840
AsyncAbstractResponse::_ack 2920 13
AsyncAbstractResponse::_write2 : 8760
AsyncAbstractResponse::_ack 2920 10
AsyncAbstractResponse::_write2 : 11680
AsyncAbstractResponse::_ack 2920 8
AsyncAbstractResponse::_write2 : 14600
AsyncAbstractResponse::_ack 2920 8
AsyncAbstractResponse::_write2 : 17323
AsyncAbstractResponse::_ack 1460 55
AsyncAbstractResponse::_ack 1263 60

However when using my phone the following logging shows:

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 1360 9
AsyncAbstractResponse::_write2 : 4280
AsyncAbstractResponse::_ack 1360 1
AsyncAbstractResponse::_write2 : 5640

Instead of an ack of the 2920 sent bytes an ack is sent for only 1360 bytes. Can it be that the package size on my phone is smaller and at most 1360 bytes can be sent per package ?

After this I hardcoded

space = 1360;

And now the file loads.

@me-no-dev
Copy link
Owner

�it's totally possible that your phone's stack has a 1360 bytes limit per packet, but that still should not matter and all data should be transmitted, given that the phone's stack is OK

@marwijn
Copy link
Author

marwijn commented Aug 28, 2016

In function space in ESPAsyncTcp.cpp. Shouldn't tcp_sndbuf be tcp_mss ? (Max segment size)

@me-no-dev
Copy link
Owner

it's 1 or 2 times MSS. Whatever the stack can take in memory for that connection. If it returns 0, then the stack is full and waiting for ack or retries to finish so it can take more data. ESP8266 has 1460 MSS

@crashsoft
Copy link

I have the same problem as marwijn.
Thanks to this post, I finally fixed it by hardcoding this 'space' in Webresponses.cpp. But i had to change space to slightly different number. space = 1400;

I have found, that pages loads normally in my local network, but not from internet :(
And the heap size decreases dramatically if i don't have space set to 1400 and try to load larger (>3kB) files outside my home network...

Is this an issue with specific routers?
My router is "Thomson TG789vn"

Hope there will be official fix.

@me-no-dev
Copy link
Owner

can you set that number to 1452 and try again? I might have an idea :)

@crashsoft
Copy link

With space set to 1452 i can't load webpages.
Inside my home networks works fine.

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 1452
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 1452
AsyncAbstractResponse::_ack 1400 29
AsyncAbstractResponse::_write2 : 2816
TIMEOUT: 5072, state: Established
SPIFFS_close: fd=1
SPIFFS_close: fd=1

@me-no-dev
Copy link
Owner

@crashsoft in your case 1400 is the magic number. But why? What makes the MTU different? (You can see you get ack with size 1400)

@crashsoft
Copy link

So we can't fix the problem :/
It occurs only withsome routers.

Here is another example. Now with size_t space = request->client()->space();

This is repsonse from client within LAN. ack is 2920

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 2920 28
AsyncAbstractResponse::_write2 : 4765
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 876
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2165
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 1098
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_ack 0 0

This is repsonse from client outside LAN. ack is 1400

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 1400 32
AsyncAbstractResponse::_write2 : 2800
AsyncAbstractResponse::_ack 1400 2
AsyncAbstractResponse::_write2 : 3245
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 876
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2165
AsyncAbstractResponse::_ack 1400 33
AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 1098
AsyncAbstractResponse::_ack 1098 47
TIMEOUT: 5103, state: Established

@me-no-dev
Copy link
Owner

it will be great if we know what is the connection minimum MTU, but we do not. I read some about it and the conclusion is that there are many cases where somewhere across the path of the connection there can be a device that has lower MTU than what you have on your interface. Safe size was 576 bytes or a number close to that. Surely imposing such constrain on the network stack will have really bad effect on the throughput :)
In your case the magic number is 1400, but in other cases it's 1360 or something else :)
Sometimes the limit is imposed by pppoe or other protocol that usually have a known size, but 1400 or 1360 is nothing even close to the known protocols...
If you guys have an idea how the limit can be sampled safely? maybe watch the ack size, but that does not guarantee successful transmission because it requires sending larger buffer.
Can you test that theory?

@pcarver
Copy link

pcarver commented Sep 8, 2016

The "standard" approach is PMTU https://en.wikipedia.org/wiki/Path_MTU_Discovery but that can fail if there are overly enthusiastic security devices in the path and I don't know if doing PMTU on the ESP8266 is asking too much.

@me-no-dev
Copy link
Owner

I'm not positive that the esp will get the ICMP responses at all or if it even integrated into lwip. Will see what I can find :)

@me-no-dev
Copy link
Owner

ok i looked through lwip and such messages are not handled at all. I added this to a local branch here which will echo if fragmentation icmp is received. Can you add it on your side and use "Open Source (gcc)" option for "Core Development Module" to build lwip on the fly and see if you get it?

@me-no-dev
Copy link
Owner

also this to fix the compiler

@me-no-dev
Copy link
Owner

on another go... lwip does not set the DF flag so I guess fragmentation requests will not be received

@marwijn
Copy link
Author

marwijn commented Sep 9, 2016

I don't know if this helps, but I've changed in webresponses in the following:

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest *request, size_t len, uint32_t time){
if(!_sourceValid()){
_state = RESPONSE_FAILED;
request->client()->close();
return 0;
}
_ackedLength += len;
size_t space = request->client()->space();
space = (space / 1360) * 1360;

I think at my system 1360 is equal to tcp_mss. Can't this value be used ?

@me-no-dev
Copy link
Owner

we can not impose such limitation in the library. There must be another way to do things ;) someone might come with mss smaller than that. The "fix" should be universal and should not limit the packet size when the connection can support higher MSS. I can add a setter in the lib to allow limiting the outgoing size but how will that help in reality? Any ESP can end up in that situation at any time. Maybe the problem is actually on the router side? because it should be it's job to fragment and ensure transmission (since we do not set the DoNotFragment flag).
I'll do some more thinking :) I hope to figure out a way to test it though... but no such problems in my network

@marwijn
Copy link
Author

marwijn commented Sep 12, 2016

Have you tried to reproduce by configuring the ESP as access point and connect directly with an android phone (if you have one available).

@crashsoft
Copy link

marwijn, yes, i havee tried ESP as access point -- works fine. But i need my ESP to be connected to internet.

@tbnobody
Copy link

I think this might be the Same problem? me-no-dev/ESPAsyncWebServer#63

@tbnobody
Copy link

Hm what I don't understand is the fact, that it's working pretty nice with the webserver library which is included in the ESP arduino package.

The MTU in my network is 1220 because of an cisco wireless lan controller which tunnels each package from the AP to the controller.

If I enable the debug mode and perform an request using the included webserver library I get the following output:

Request: /css/bootstrap.min.css
 Arguments: 
StaticRequestHandler::handle: request=/css/bootstrap.min.css _uri=/
StaticRequestHandler::handle: path=/css/bootstrap.min.css, isFile=0
:wr
:sent 148
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240

If I perform the same request using the ESPAsyncWebserver library I get the following:

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 2440 4
AsyncAbstractResponse::_write2 : 5360

I am not very familiar with the deep functionality of TCP/IP but how does the first code immediatly detect the right package size?

@me-no-dev
Copy link
Owner

To be honest I have no clue! Both TCP libs driving the WebServers are based pretty much on the same exact code. There isn't much to it, It asks LwIP how much it can send and sends that much, counts what has been acked back from the other end and sends more. Nowhere does it limit the size to anything, just whatever the stack on the back allows to do. I have been seeing this issue times and times and have tried what I can remotely, but I have not been able to reproduce it here so I can debug it better :(

@tbnobody
Copy link

Hm if I understand the debug output correctly, in the first code I get two ACK packages

:sent 1220
:sent 240

I think that's because the package is been fragmented and the client receives two packages. In the second output I only see one ACK response (with the same size as the first ACK response in the first code).

Had a look at the ClientContext.h file which is included in the ESP for Arduino libraries.
This code is executed in the write method:

_size_sent = will_send;
DEBUGV(":wr\r\n");
tcp_output( _pcb );
_send_waiting = true;
delay(5000); // max send timeout
_send_waiting = false;
DEBUGV(":ww\r\n");
return will_send - _size_sent;

And I think there is a callback function which will be executed on every received ACK:

        err_t _sent(tcp_pcb* pcb, uint16_t len) {
            DEBUGV(":sent %d\r\n", len);
            _size_sent -= len;
            if(_size_sent == 0 && _send_waiting) esp_schedule();
            return ERR_OK;
        }

If I interpret the code correct, the data will be written to an output buffer (tcp_output) and than an delay of 5 seconds will be performed to wait for all outstanding ACKs. During this 5 seconds the _sent method will be executed for every received ACK and the _size_sent will be counted down to zero. If it reaches zero, the delay will be canceld (esp_schedule).

In your function:

int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
  _rx_last_packet = millis();
  ASYNC_TCP_DEBUG("_sent: %u\n", len);
  _pcb_busy = false;
  if(_sent_cb)
    _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
  return ERR_OK;
}

You call the callback on every received ACK package. I am not sure if I see a location where you wait for the second ACK. If I understand the code corretly it should be somewhere in the method size_t AsyncAbstractResponse::_ack.
But I am not sure if it's necessary to wait for all ACKs before sending out the next bunch of data.

@tbnobody
Copy link

It's working now. But I don't know if this change has any negative side effects (also variable names are just prefixed that I find them easier during debugging):

diff --git a/src/ESPAsyncTCP.cpp b/src/ESPAsyncTCP.cpp
index da55816..31752ef 100644
--- a/src/ESPAsyncTCP.cpp
+++ b/src/ESPAsyncTCP.cpp
@@ -219,10 +219,10 @@ size_t AsyncClient::write(const char* data) {
 }

 size_t AsyncClient::write(const char* data, size_t size, uint8_t apiflags) {
-  size_t will_send = add(data, size, apiflags);
-  if(!will_send || !send())
+  tba_will_send = add(data, size, apiflags);
+  if(!tba_will_send || !send())
     return 0;
-  return will_send;
+  return tba_will_send;
 }

 size_t AsyncClient::add(const char* data, size_t size, uint8_t apiflags) {
@@ -355,7 +355,8 @@ int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
   _rx_last_packet = millis();
   ASYNC_TCP_DEBUG("_sent: %u\n", len);
   _pcb_busy = false;
-  if(_sent_cb)
+  tba_will_send -= len;
+  if(_sent_cb && tba_will_send == 0)
     _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
   return ERR_OK;
 }
diff --git a/src/ESPAsyncTCP.h b/src/ESPAsyncTCP.h
index 9efdc49..75bcd84 100644
--- a/src/ESPAsyncTCP.h
+++ b/src/ESPAsyncTCP.h
@@ -49,6 +49,10 @@ typedef struct SSL_CTX_ SSL_CTX;
 #endif

 class AsyncClient {
+
+  private:
+    size_t tba_will_send = 0;
+
   protected:
     friend class AsyncTCPbuffer;
     tcp_pcb* _pcb;

@tbnobody
Copy link

tbnobody commented Nov 12, 2016

Hm maybe it's better to place _pcb_busy = false; inside the if clause:

diff --git a/src/ESPAsyncTCP.cpp b/src/ESPAsyncTCP.cpp
index da55816..0a2a7d3 100644
--- a/src/ESPAsyncTCP.cpp
+++ b/src/ESPAsyncTCP.cpp
@@ -219,10 +219,10 @@ size_t AsyncClient::write(const char* data) {
 }

 size_t AsyncClient::write(const char* data, size_t size, uint8_t apiflags) {
-  size_t will_send = add(data, size, apiflags);
-  if(!will_send || !send())
+  tba_will_send = add(data, size, apiflags);
+  if(!tba_will_send || !send())
     return 0;
-  return will_send;
+  return tba_will_send;
 }

 size_t AsyncClient::add(const char* data, size_t size, uint8_t apiflags) {
@@ -354,9 +354,12 @@ void AsyncClient::_ssl_error(int8_t err){
 int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
   _rx_last_packet = millis();
   ASYNC_TCP_DEBUG("_sent: %u\n", len);
-  _pcb_busy = false;
-  if(_sent_cb)
-    _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
+  tba_will_send -= len;
+  if (tba_will_send == 0) {
+   _pcb_busy = false;
+   if (_sent_cb)
+     _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
+  }
   return ERR_OK;
 }

diff --git a/src/ESPAsyncTCP.h b/src/ESPAsyncTCP.h
index 9efdc49..75bcd84 100644
--- a/src/ESPAsyncTCP.h
+++ b/src/ESPAsyncTCP.h
@@ -49,6 +49,10 @@ typedef struct SSL_CTX_ SSL_CTX;
 #endif

 class AsyncClient {
+
+  private:
+    size_t tba_will_send = 0;
+
   protected:
     friend class AsyncTCPbuffer;
     tcp_pcb* _pcb;

@me-no-dev
Copy link
Owner

so you propose to mark the client as busy till all acks come and to track the sent/acked status in the client? What if I call write twice? tba_will_send will be overwritten, then on ack will be deducted the previous send and so on. Still I do not want to impose such limitation on a client level. I have been removing the pcb_busy dependent code through time. Also this in no way will calculate the actual maximum packet size or mimic what is happening in the other log from the regular Server. The real resolution for this problem should be universal for any given max-packet and should not limit functionality and throughput. For example, LwIP should not tell me that I can send 1460 bytes when the other end can not receive them, or should handle sending those bytes in proper packets on it's own. I have read some on the matter and issue seems to be in LwIP itself not being able to properly receive and manage fragmentation errors and also to take into account whet the other end advertises as MSS when they are connecting

@tbnobody
Copy link

Hm you are right. It works nice for one request but not for simultaneous requests. I am not that deep in the whole matter. But is it possible to call write twice without receiving the ack package for the same connection? As far as I saw, the write function was called in the webresponse ack method. But that's just a guess, I don't know any possible side effects. You are right LwIP should tell the MTU for the connection (if this is possible). In any other case there will always be a package fragmentation.

@me-no-dev
Copy link
Owner

packet fragmentation is not the issue as neither side can really know what the MSS is through the connection in between. The limiting router can be somewhere externally for both and both to have full 1460 available locally. Issue is that lwip does not even have the code for handling "packet needed fragmentation" ICMP messages. The only code for ICMP is for ping and I am still not that familiar with all the goes inside. I do know that newer versions of LwIP have that taken care of and I know that we will update ESP8266 to those new versions, but timeframe is sadly not something that I can give.
I do notice some commits made to the ESP8266 tree lately though, so it might not be long :)

@marwijn
Copy link
Author

marwijn commented Nov 12, 2016

I agree with me-no-dev that this is an issue in the LwIP stack. As I wanted to use this with the AsyncWebbrowser I've made the following change in WebResponses.cpp :

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest *request, size_t len, uint32_t time){
  if(!_sourceValid()){
    _state = RESPONSE_FAILED;
    request->client()->close();
    return 0;
  }
  _ackedLength += len;
 if (_ackedLength < _writtenLength) return 0; // Do not add a new packet till the previous one is fully processed.

This seems to work nice. It is still a workaround, but I guess it's better then nothing.

@tbnobody
Copy link

I fully agree with you. It's not nice to reduce the performance of the lib, but slower is better than non working :)
Does it maybe make sense to add such a workaround as a define/configuration option? I personally have no problem to patch the library. But the problem seems to occour for more people.

But first of all I am happy that it's working now :)

@me-no-dev
Copy link
Owner

@marwijn I really like you workaround! That is more of how it should be implemented :) in the upper level. I'm going to shamesly steal it!

@me-no-dev
Copy link
Owner

@tbnobody could you give that a go and see how it does on your side?

@marwijn
Copy link
Author

marwijn commented Nov 13, 2016

I've been looking again at tbnobody fix once more since a fix in the asynctcp library would fix it for all clients. Would the following also work:
In the write change

tba_will_send = add(data, size, apiflags);

to

tba_will_send += add(data, size, apiflags);

And check at the start of the function whether tba_will_send == 0, and if so return 0;

I don't see the need for the busy flag ?
This will have some impact on performance, but at least it would work for all clients.

By the way if you're online now we might use a chatroom to discuss.

@me-no-dev
Copy link
Owner

yes I am :) you'll come to Gitter or?

@me-no-dev
Copy link
Owner

YEAH!!! Closing this finally! fixed in latest git

@crashsoft
Copy link

Wow, great job! You've fixed it. Now i can switch back to AsyncServer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants