Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of wifi AP connection results in Azure/TLS errors on recovery (IDFGH-7210) (CA-220) #127

Open
rtheil-growlink opened this issue Apr 20, 2022 · 6 comments

Comments

@rtheil-growlink
Copy link

Environment

  • Module or chip used: ESP32-WROOM-32UE
  • IDF version (run git describe --tags to find it): 4.3.2
  • Build System: idf.py in vs code
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it):
    // 1.22.0-80-g6c4433a
  • Operating System: WIndows
  • (Windows only) environment type: N/A
  • Using an IDE?: [No|Yes (please give details)] Yes, vs code

Problem Description

ESP32 on esp-idf 4.3.2

I've adapted the Azure example project into our project using the port stuff from the example. I'm testing different disconnection scenarios and if I have a controller up and running and connected to azure, then I power down the wifi AP, the controller loses connectivity as expected, but is never able to recover once reconnected.

Here's the log file from the disconnection event. It starts with a confirmation of a message sent to Azure IoT hub. Then the wifi disconnection happens, and right away there's a tls error, "esp-tls-mbedtls: read error :-76". I've seen this in other cases and when this happens Azure client is never able to recover. Later, there are errors about, "tlsio_esp_tls_send_async without a prior successful open". This repeats.

I (191518) AzureService: Confirmation[15] received for message tracking id = 15 with result = IOTHUB_CLIENT_CONFIRMATION_OK
I (213958) wifi:bcn_timout,ap_probe_send_start
I (216468) wifi:ap_probe_send over, resett wifi status to disassoc
I (216468) wifi:state: run -> init (c800)
I (216468) wifi:pm stop, total sleep time: 178968508 us / 213361706 us
W (216468) wifi:<ba-del>idx
W (216478) wifi:<ba-del>idx
I (216478) wifi:new:<9,0>, old:<9,0>, ap:<255,255>, sta:<9,0>, prof:1
E (216488) esp-tls-mbedtls: read error :-76:
E (216488) UDP: stop
I (216568) gpio: GPIO[12]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
W (221558) wifi:Haven't to connect to a suitable AP now!
W (221558) wifi:Haven't to connect to a suitable AP now!
I (221558) WiFiClass: rssi: 0
I (230878) gpio: GPIO[12]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (241458) gpio: GPIO[12]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
W (251608) wifi:Haven't to connect to a suitable AP now!
W (251608) wifi:Haven't to connect to a suitable AP now!
I (251608) WiFiClass: rssi: 0
I (251618) gpio: GPIO[12]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
Error: Time:Thu Apr 14 19:22:24 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:22:24 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:22:24 PINGREQ
Error: Time:Thu Apr 14 19:22:24 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:22:24 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data

Once the wifi is plugged back in and started up, the controller will reconnect successfully to wifi AP, then throw a bunch more tls/azure errors, get an IP, then continue to throw the same tls and azure client errors. The controller is unable to recover from this and must be reset to get the Azure client working again.

-> 19:27:31 PINGREQ

I (561108) wifi:new:<9,0>, old:<9,0>, ap:<255,255>, sta:<9,0>, prof:1
I (561108) wifi:state: init -> auth (b0)
I (561118) wifi:state: auth -> assoc (0)
I (561128) wifi:state: assoc -> run (10)
I (561248) wifi:connected with gl-esp1, aid = 1, channel 9, BW20, bssid = 94:83:c4:17:0c:40
I (561248) wifi:security: WPA2-PSK, phy: bgn, rssi: -15
I (561248) wifi:pm start, type: 1

I (561318) wifi:AP's beacon interval = 102400 us, DTIM period = 1
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:35 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:35 PINGREQ
I (563118) WiFiClass: rssi: -15
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:35 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:35 PINGREQ
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:35 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:35 PINGREQ
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:35 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:35 PINGREQ
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:35 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:35 PINGREQ
Error: Time:Thu Apr 14 19:27:35 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
Error: Time:Thu Apr 14 19:27:36 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:36 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
-> 19:27:36 PINGREQ
W (564588) wifi:<ba-add>idx:0 (ifx:0, 94:83:c4:17:0c:40), tid:6, ssn:2, winSize:64
W (564818) wifi:<ba-add>idx:1 (ifx:0, 94:83:c4:17:0c:40), tid:0, ssn:2, winSize:64
I (565558) esp_netif_handlers: sta ip: 192.168.1.71, mask: 255.255.255.0, gw: 192.168.1.1
I (565558) WiFiClass: on_wifi_got_ip 192.168.1.71
Error: Time:Thu Apr 14 19:27:40 2022 File:../port/src/tlsio_esp_tls.c Func:tlsio_esp_tls_send_async Line:552 tlsio_esp_tls_send_async without a prior successful open
Error: Time:Thu Apr 14 19:27:40 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:sendPacketItem Line:387 Failure sending control packet data
Error: Time:Thu Apr 14 19:27:40 2022 File:../lib/azure-iot-sdk-c/umqtt/src/mqtt_client.c Func:mqtt_client_publish Line:1175 Error: mqtt_client_publish send failed
Error: Time:Thu Apr 14 19:27:40 2022 File:../lib/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c Func:publishTelemetryMsg Line:1001 Failed attempting to publish mqtt message

Worth noting that this is tls specific. Standard http requests from this controller work fine once the wifi has reconnected.

Also notable is that the azure sdk never calls the connection status callback method set with IoTHubDeviceClient_LL_SetConnectionStatusCallback().

Expected Behavior

When wifi AP is lost, TLS should recover and Azure client should be able to reconnect

Actual Behavior

TLS cannot recover until Azure client attempts to renew SAS token

@github-actions github-actions bot changed the title Loss of wifi AP connection results in Azure/TLS errors on recovery Loss of wifi AP connection results in Azure/TLS errors on recovery (IDFGH-7210) Apr 20, 2022
@igrr igrr transferred this issue from espressif/esp-idf Apr 22, 2022
@github-actions github-actions bot changed the title Loss of wifi AP connection results in Azure/TLS errors on recovery (IDFGH-7210) Loss of wifi AP connection results in Azure/TLS errors on recovery (IDFGH-7210) (CA-220) Apr 22, 2022
@hauserkristof
Copy link

I have the exact same issue! Could somebody managed to avoid it?

@rtheil-growlink
Copy link
Author

I switched to the Azure middleware for FreeRTOS. Worth the effort. https://github.com/Azure/azure-iot-middleware-freertos

@Vivero
Copy link

Vivero commented Oct 7, 2022

I was running into this same exact issue. Sometimes I'd get into this scenario even without turning off the AP. My device would just randomly disconnect and then never recover. The only workaround I was able to find was to modify a part of the MQTT code myself. Here's what that fix looks like (using my own fork of azure-umqtt-c): MvIt-Now/azure-umqtt-c@bcb0b23

@afcec
Copy link

afcec commented Dec 7, 2022

I have also the same issue but unfortunately this repo seems to be dead...

@rtheil-growlink
Copy link
Author

rtheil-growlink commented Dec 7, 2022

@afcec Look into Azure middleware. They have a samples repo that has everything you need for getting up and running on any espressif device. https://github.com/Azure/azure-iot-middleware-freertos

@ClaesIvarsson
Copy link

ClaesIvarsson commented Sep 29, 2023

I just found a problem in tlsio_esp_tls.c
When a problem occur ion the socket, the state goes to TLSIO_STATE_ERROR, however in most cases it is not reported upstream, which means azure sdk is not aware of the problem.
My solution was to change the code to always call enter_tlsio_error_state when current code set the error state
Happens only in the function tlsio_esp_tls_dowork:

static void tlsio_esp_tls_dowork(CONCRETE_IO_HANDLE tls_io)
{
    if (tls_io == NULL)
    {
        /* Codes_SRS_TLSIO_30_070: [ If the tlsio_handle parameter is NULL, tlsio_dowork shall do nothing except log an error. ]*/
        LogError("NULL tlsio");
    }
    else
    {
        TLS_IO_INSTANCE* tls_io_instance = (TLS_IO_INSTANCE*)tls_io;

        // This switch statement handles all of the state transitions during the opening process
        switch (tls_io_instance->tlsio_state)
        {
        case TLSIO_STATE_CLOSED:
            /* Codes_SRS_TLSIO_30_075: [ If the adapter is in TLSIO_STATE_EXT_CLOSED then  tlsio_dowork  shall do nothing. ]*/
            // Waiting to be opened, nothing to do
            break;
        case TLSIO_STATE_INIT:
            {
                int result = esp_tls_conn_new_async(tls_io_instance->hostname, strlen(tls_io_instance->hostname), tls_io_instance->port, &tls_io_instance->esp_tls_cfg, tls_io_instance->esp_tls_handle);
                if (result == 1) {
                    tls_io_instance->tlsio_state = TLSIO_STATE_OPEN;
                    tls_io_instance->on_open_complete(tls_io_instance->on_open_complete_context, IO_OPEN_OK);
                } else if (result == -1) {
                    enter_tlsio_error_state(tls_io_instance);
                }
            }
            break;
        case TLSIO_STATE_OPEN:
            if (dowork_read(tls_io_instance) < 0 && errno != EAGAIN)
            {
                enter_tlsio_error_state(tls_io_instance);
            }
            if (dowork_send(tls_io_instance) < 0 && errno != EAGAIN)
            {
                enter_tlsio_error_state(tls_io_instance);
            }
            break;
        case TLSIO_STATE_ERROR:
            /* Codes_SRS_TLSIO_30_071: [ If the adapter is in TLSIO_STATE_EXT_ERROR then tlsio_dowork shall do nothing. ]*/
            // There is nothing valid to do here but wait to be retried
            break;
        default:
            LogError("Unexpected internal tlsio state");
            break;
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants