Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v4.4.2] Doing start + stop + start triggers an error (IDFGH-8543) #239

Closed
KonssnoK opened this issue Oct 17, 2022 · 12 comments
Closed

[v4.4.2] Doing start + stop + start triggers an error (IDFGH-8543) #239

KonssnoK opened this issue Oct 17, 2022 · 12 comments

Comments

@KonssnoK
Copy link

As per title,

if i do
esp_mqtt_client_start
esp_mqtt_client_stop
esp_mqtt_client_start

the last start returns an error.

I had to modify the code to see what was the error (please make significant error messages).

The internal state machine of the MQTT client is stuck on MQTT_STATE_WAIT_RECONNECT.

the issue is in esp_mqtt_client_stop, where the state of the machine is put before waiting for the task to end, which means that any call to esp_mqtt_abort_connection() will put the reconnect status after being disconnected.

W (15:02:29.494) network: Network disconnected. Halt MQTT: 1
W (15:02:29.499) mesh_main: Start checking for FIXED ROOT handover
E (15:02:29.499) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 62
E (15:02:29.509) TRANSPORT_BASE: poll_write select error 0, errno = Success, fd = 62
E (15:02:29.519) MQTT_CLIENT: Poll read error: 119, aborting connection
W (15:02:29.528) TRANSPORT_BASE: Poll timeout or error, errno=Success, fd=62, timeout_ms=10000
E (15:02:29.542) MQTT_CLIENT: Writing failed: errno=0
I (15:02:29.547) mqtt_app: MQTT_EVENT_ERROR
E (15:02:29.552) MQTT_CLIENT: Error sending disconnect message
I (15:02:29.560) mqtt_app: MQTT_EVENT_DISCONNECTED
I (15:02:44.259) network: network:: routing table update

I (15:02:44.740) processor: send status... (60494ms)
W (15:02:59.506) mesh_main: Triggering FIXED ROOT handover

I (15:03:06.246) esp-netif_lwip-ppp: Connected
W (15:03:06.255) network: GOT IP from ppp_sta
W (84460) wifi:Haven't to connect to a suitable AP now!
I (15:03:06.264) mesh_netif: Interface AP
W (84474) wifi:Haven't to connect to a suitable AP now!
I (15:03:06.275) mesh_netif: Clearing interface <sta>
I (15:03:06.281) mesh_netif: It was a wifi station, removing handlers
I (15:03:06.288) mesh_netif: Interface PPP
I (15:03:06.292) modem_comm: Modem connected to PPP server
I (15:03:06.300) network: Network connected
E (15:03:06.302) MQTT_CLIENT: Client has started 3
@github-actions github-actions bot changed the title [v4.4.2] Doing start + stop + start triggers an error [v4.4.2] Doing start + stop + start triggers an error (IDFGH-8543) Oct 17, 2022
@KonssnoK
Copy link
Author

Example of workaround
image

@KonssnoK
Copy link
Author

any feedback?

@euripedesrocha
Copy link
Collaborator

Hi @KonssnoK thanks for reporting.
I'm investigating the issue and for clarification:

  • The client was in MQTT_STATE_WAIT_RECONNNECT when the esp_mqtt_client_stop was called?
  • The call for esp_mqtt_client_stop returned ESP_OK?

@KonssnoK
Copy link
Author

KonssnoK commented Oct 25, 2022

hello @euripedesrocha ,

  • Yes, it's visible in the logs E (15:03:06.302) MQTT_CLIENT: Client has started 3
  • Yes, ESP_OK
W (14:29:53.363) network: Network disconnected. Halt MQTT: 1

-- > HERE STOP IS CALLED

E (14:29:53.371) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 62
E (14:29:53.385) MQTT_CLIENT: Poll read error: 119, aborting connection
W (14:29:53.407) TRANSPORT_BASE: Poll timeout or error, errno=Success, fd=62, timeout_ms=10000
E (14:29:53.421) MQTT_CLIENT: Writing failed: errno=0
void mqtt_app_stop()
{
    if (!self->client) {
        ESP_LOGW(TAG, "MQTT Client not allocated!");
        return;
    }

    ESP_ERROR_CHECK_WITHOUT_ABORT(esp_mqtt_client_stop(self->client));
}

@euripedesrocha
Copy link
Collaborator

euripedesrocha commented Oct 26, 2022

Hi @KonssnoK, from your information we identified the issue, and we have a fix for it. Yet I wasn't able to reproduce the scenario. Could you share more about your configuration and client usage? Do you have API_LOCKS disabled?

@KonssnoK
Copy link
Author

KonssnoK commented Oct 26, 2022

no, standard configuration.

We have a MESH networking,
on Parent disconnection we trigger

void mqtt_app_stop()
{
    if (!self->client) {
        ESP_LOGW(TAG, "MQTT Client not allocated!");
        return;
    }

    esp_mqtt_client_stop(self->client);
}

on IP received we do

void mqtt_app_start(void)
{
    if (!self->client) {
        ESP_LOGW(TAG, "MQTT Client not allocated!");
        return;
    }

    // Starts MQTT
    esp_mqtt_client_start(self->client);
}

once we do

    esp_mqtt_client_config_t mqtt_cfg = {
        .uri = server,
        .cert_pem = (const char*)server_crt,
        .client_cert_pem = (const char*)client_crt,
        .client_key_pem = NULL,
        .ds_data = ds_data,
        .username = user,
        .password = pass,
        .client_id = client_id,
        .task_prio = TASK_PRIORITY_DEFAULT,
        .task_stack = TASK_STACK_DEPTH_DEFAULT
    };

    self->client = esp_mqtt_client_init(&mqtt_cfg);
    esp_mqtt_client_register_event(self->client, ESP_EVENT_ANY_ID, mqtt_event_handler, self->client);

david-cermak pushed a commit that referenced this issue Oct 26, 2022
- In some situations client could be left in a state where it's
  impossible to restart due to a state change while waiting for the
  stop. This guarantee that it's possible to start after calling stop.

Closes #239
@KonssnoK
Copy link
Author

@david-cermak could you please notify when the esp-mqtt component is pushed on esp-idf v4.4?
thanks!

@david-cermak
Copy link
Collaborator

@KonssnoK Will reference this fix in IDF release/v4.4 together with the other issue you posted in IDF espressif/esp-idf#9492. As a subscriber to that issue, you should receive an automatic notification when we close it and that would also mean that this fix is available as well.

So, does the af58f5f fix your issue? Please comment or reopen if not.

@KonssnoK
Copy link
Author

KonssnoK commented Nov 2, 2022

@david-cermak we haven't checked yet, working on other issues with your colleagues.
Sure I'll reopen in case i see any issue :)

@AxelLin
Copy link
Contributor

AxelLin commented Nov 7, 2022

So, does the af58f5f fix your issue? Please comment or reopen if not.

I cannot find the fix in esp-idf v4.x branches (v4.4 and v4.3), would you please update the esp-idf submodule and push the fix?

@euripedesrocha
Copy link
Collaborator

@AxelLin the back porting process is ongoing. It takes a little to be available due to the testing process to avoid regressions.

espressif-bot pushed a commit to espressif/esp-idf that referenced this issue Nov 25, 2022
* Fixes keepalive calculation to use esp-timer (previously used system
timer that might get updated by  NTP sync and failing keepalive
mechanism)
* Fixes start() -> stop() -> start() client's transitions, due to
wrong state of stopped client (could endup in WAIT_RECONNECT state and
thus never allowed to be started again)
* Update submodule: git log --oneline 60983d1dd54196b1e3f399c6f928d77256ec742a..bb9c8af9d552b608dd3aabf9617bde757a538ebe

Detailed description of the changes:
* Backport Moves state change when stopping the client
  - See merge request espressif/esp-mqtt!151
  - Closes espressif/esp-mqtt#239
  - Moves state change when stopping the client (espressif/esp-mqtt@af58f5f)
* fix: MQTT tick not starting on connect and switch to esp_timer
  - See merge request espressif/esp-mqtt!146
  - Closes #9492
  - See commit espressif/esp-mqtt@166741f
* ci: Fix CI build using public IDF images
  - See merge request espressif/esp-mqtt!148
  - See commit espressif/esp-mqtt@530fe3a

Closes #9492
espressif-bot pushed a commit to espressif/esp-idf that referenced this issue Dec 22, 2022
* Update submodule: git log --oneline ae53d799da294f03ef65c33e88fa33648e638134..fde00340f19b9f5ae81fff02ccfa9926f0e33687

Detailed description of the changes:
* Fix the default configuration for event queue
  - See merge request espressif/esp-mqtt!153
  - See commit espressif/esp-mqtt@fb42588
* Adds missing header.
  - See merge request espressif/esp-mqtt!152
  - See commit espressif/esp-mqtt@8a60057
* Moves state change when stopping the client
  - See merge request espressif/esp-mqtt!150
  - Closes espressif/esp-mqtt#239
  - See commit espressif/esp-mqtt@3738fcd
* Adds error code to MQTT_EVENT_SUBSCRIBED in case of failure
  - See merge request espressif/esp-mqtt!143
  - - Closes espressif/esp-mqtt#233
  - See commit espressif/esp-mqtt@9af5c26
* Adds debug information on sending dup messages
  - See merge request espressif/esp-mqtt!145
  - See commit espressif/esp-mqtt@47b3f9b
* ci: Fix qemu build
  - See merge request espressif/esp-mqtt!147
  - See commit espressif/esp-mqtt@68e8c4f
* ci: Build and Test QEMU on v5.0
  - See merge request espressif/esp-mqtt!142
  - See commit espressif/esp-mqtt@9db9ee7
* client: Add support for user events
  - See merge request espressif/esp-mqtt!140
  - Closes espressif/esp-mqtt#230
  - See commit espressif/esp-mqtt@97503cc
* Adds unregister event API
  - See merge request espressif/esp-mqtt!139
  - Closes #9194
  - See commit espressif/esp-mqtt@a9a9fe7
egnor pushed a commit to egnor/esp-mqtt that referenced this issue Dec 23, 2022
- In some situations client could be left in a state where it's
  impossible to restart due to a state change while waiting for the
  stop. This guarantee that it's possible to start after calling stop.

Closes espressif#239
egnor pushed a commit to egnor/esp-mqtt that referenced this issue Dec 23, 2022
- In some situations client could be left in a state where it's
  impossible to restart due to a state change while waiting for the
  stop. This guarantee that it's possible to start after calling stop.

Closes espressif#239
@AxelLin
Copy link
Contributor

AxelLin commented Jan 7, 2023

@AxelLin the back porting process is ongoing. It takes a little to be available due to the testing process to avoid regressions.

esp-idf-v4.3 still does not include this one-line-fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants