Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECONNABORTED periodically on MQTT nodes (IDFGH-7981) #9492

Closed
KonssnoK opened this issue Aug 3, 2022 · 10 comments
Closed

ECONNABORTED periodically on MQTT nodes (IDFGH-7981) #9492

KonssnoK opened this issue Aug 3, 2022 · 10 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@KonssnoK
Copy link
Contributor

KonssnoK commented Aug 3, 2022

Using esp32s3, mesh-networking, MQTT, 11 devices.

Devices form a network and connect individually to a MQTT Broker.

Periodically, once per hour, i see this on the terminal:

Looking at the log, i would say this happens always after an NTP synchronization, but it shouldn't influence the MQTT connection

I (12:56:53.267) mqtt_app: sent publish returned msg_id=0
W (12:56:55.096) ntp: state change: [2 (ESP), 1]
I (12:56:55.096) processor: NTP event: 1
W (12:56:58.296) ntp: state change: [1 (ESP), 2]
I (12:57:00.835) mqtt_app: sent publish returned msg_id=47476
I (12:57:00.862) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=47476
I (12:57:18.818) mqtt_app: sent publish returned msg_id=53052
I (12:57:18.901) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=53052
E (12:57:25.566) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 54
E (12:57:25.567) MQTT_CLIENT: Poll read error: 119, aborting connection
I (12:57:25.575) mqtt_app: MQTT_EVENT_DISCONNECTED
I (12:57:26.536) network: Network connected
I (12:57:26.536) network: IP: 10.0.0.99
I (12:57:26.537) network: Netmask: 255.255.0.0
I (12:57:26.539) network: Gateway: 10.0.0.1
W (12:57:26.544) mqtt_app: MQTT Client is already running!
I (12:57:35.580) mqtt_app: MQTT_EVENT_BEFORE_CONNECT
I (12:57:36.845) mqtt_app: MQTT_EVENT_CONNECTED
I (11:56:48.209) mqtt_app: sent publish returned msg_id=0
W (11:56:55.715) ntp: state change: [2 (ESP), 1]
I (11:56:55.716) processor: NTP event: 1
W (11:56:58.762) ntp: state change: [1 (ESP), 2]
I (11:57:00.276) mqtt_app: sent publish returned msg_id=48805
I (11:57:00.309) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=48805
I (11:57:01.775) mqtt_app: sent publish returned msg_id=18438
I (11:57:01.845) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=18438
E (11:57:23.371) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 54
E (11:57:23.372) MQTT_CLIENT: Poll read error: 119, aborting connection
I (11:57:23.380) mqtt_app: MQTT_EVENT_DISCONNECTED
I (11:57:24.330) network: Network connected
I (11:57:24.330) network: IP: 10.0.0.74
I (11:57:24.330) network: Netmask: 255.255.0.0
I (11:57:24.334) network: Gateway: 10.0.0.1
W (11:57:24.338) mqtt_app: MQTT Client is already running!
I (11:57:33.384) mqtt_app: MQTT_EVENT_BEFORE_CONNECT
I (11:57:34.701) mqtt_app: MQTT_EVENT_CONNECTED

W (13:56:55.202) ntp: state change: [2 (ESP), 1]
I (13:56:55.202) processor: NTP event: 1
W (13:56:59.864) ntp: state change: [1 (ESP), 2]
I (13:57:00.892) mqtt_app: sent publish returned msg_id=31774
I (13:57:00.919) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=31774
I (13:57:03.892) mqtt_app: sent publish returned msg_id=0
E (13:57:26.993) TRANSPORT_BASE: poll_read select error 113, errno = Software caused connection abort, fd = 54
E (13:57:26.994) MQTT_CLIENT: Poll read error: 119, aborting connection
I (13:57:27.002) mqtt_app: MQTT_EVENT_DISCONNECTED
I (13:57:27.604) network: Network connected
I (13:57:27.604) network: IP: 10.0.0.18
I (13:57:27.605) network: Netmask: 255.255.0.0
I (13:57:27.608) network: Gateway: 10.0.0.1
W (13:57:27.613) mqtt_app: MQTT Client is already running!
I (13:57:37.007) mqtt_app: MQTT_EVENT_BEFORE_CONNECT
I (13:57:38.270) mqtt_app: MQTT_EVENT_CONNECTED
I (13:57:38.278) mqtt_app: sent publish returned msg_id=55307
I (13:57:38.309) mqtt_app: MQTT_EVENT_SUBSCRIBED, msg_id=23716
I (13:57:38.384) mqtt_app: MQTT_EVENT_PUBLISHED, msg_id=55307

I would expect the connection never to be dropped if not requested.
Any help on how to identify the underlying issue?

@espressif-bot espressif-bot added the Status: Opened Issue is new label Aug 3, 2022
@github-actions github-actions bot changed the title ECONNABORTED periodically on MQTT nodes ECONNABORTED periodically on MQTT nodes (IDFGH-7981) Aug 3, 2022
@euripedesrocha
Copy link
Collaborator

Hi @KonssnoK, could you provide the version of idf and esp-mqtt you are using?
We recently fixed an issue espressif/esp-mqtt#225 that had similar behavior.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Aug 4, 2022

@euripedesrocha Sorry my bad for forgetting

ESP-IDF v4.4
f0a3ccc

which picks from
27eb4726067465c5c67d4ecdca5ddccd26f02580
of esp-mqtt repo

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Aug 4, 2022

The fix you are referring to is not backported to v4.X branch

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: Opened Issue is new labels Sep 30, 2022
david-cermak pushed a commit to espressif/esp-mqtt that referenced this issue Oct 26, 2022
- Tick should be updated on connect
- Dependency of system timer removed to avoid issues when updating via
  SNTP

Closes espressif/esp-idf#9492
@AxelLin
Copy link
Contributor

AxelLin commented Nov 11, 2022

The fix you are referring to is not backported to v4.X branch

@euripedesrocha

It really takes too much time waitting for backport fix to 4.x branches.
There is a big time gap between a fix available in master (espressif/esp-mqtt@89e5c60 Jul 5) v.s. fix available in idf_v4.x branch (espressif/esp-mqtt@166741f Oct 26).
And another big time gap to update esp-idf submodule. (2 week+)
I understand you need to arrange test to avoid regressions, but such long backport fix time does not make sense at all.
Any chance to improve the backport bug fix process?

@euripedesrocha
Copy link
Collaborator

Hi @AxelLin, we are aware of the long time it takes, and we are already working to improve those times.

espressif-bot pushed a commit that referenced this issue Nov 25, 2022
* Fixes keepalive calculation to use esp-timer (previously used system
timer that might get updated by  NTP sync and failing keepalive
mechanism)
* Fixes start() -> stop() -> start() client's transitions, due to
wrong state of stopped client (could endup in WAIT_RECONNECT state and
thus never allowed to be started again)
* Update submodule: git log --oneline 60983d1dd54196b1e3f399c6f928d77256ec742a..bb9c8af9d552b608dd3aabf9617bde757a538ebe

Detailed description of the changes:
* Backport Moves state change when stopping the client
  - See merge request espressif/esp-mqtt!151
  - Closes espressif/esp-mqtt#239
  - Moves state change when stopping the client (espressif/esp-mqtt@af58f5f)
* fix: MQTT tick not starting on connect and switch to esp_timer
  - See merge request espressif/esp-mqtt!146
  - Closes #9492
  - See commit espressif/esp-mqtt@166741f
* ci: Fix CI build using public IDF images
  - See merge request espressif/esp-mqtt!148
  - See commit espressif/esp-mqtt@530fe3a

Closes #9492
@AxelLin
Copy link
Contributor

AxelLin commented Nov 30, 2022

I'm still waiting for the esp-idf submodule update for v4.3 branch.

@euripedesrocha
Copy link
Collaborator

Hi @AxelLin, I'm sorry for the long waiting. We are working to merge it soon. This particular one is a bit tricky because, despite being simple, we have a change in the behavior of the component.

@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally and removed Status: Reviewing Issue is being reviewed labels Dec 7, 2022
@AxelLin
Copy link
Contributor

AxelLin commented Dec 8, 2022

Hi @AxelLin, I'm sorry for the long waiting. We are working to merge it soon. This particular one is a bit tricky because, despite being simple, we have a change in the behavior of the component.

The esp-mqtt idf_v4.x branch is supposed to be used for v4.x branches.
So that's the difference of backporting to v4.4 and v4.3 branch?

@euripedesrocha
Copy link
Collaborator

Hi @AxelLin, idf_v4x is used for v4.x branches, but you can see that in IDF we point to different commits on that branch. IDF release branches are sometimes locked, and we need to wait for the merging window. In this particular case, there are changes in the MR that are being evaluated to back port because 4.3 and 4.4 are in different stages of their lifetime support. We are working to make it easier to have a particular mqtt version available for usage in a project using the component manager.

A workaround, to not be blocked by our process, is to have a separated mqtt component in your project pointing to a convenient commit and append it to EXTRA_COMPONENT_DIRS to be selected instead of idf's internal version.

@AxelLin
Copy link
Contributor

AxelLin commented Dec 8, 2022

A workaround, to not be blocked by our process, is to have a separated mqtt component in your project pointing to a convenient commit and append it to EXTRA_COMPONENT_DIRS to be selected instead of idf's internal version.

Improve your process, long backport time issue is a generic issue.
My experience is the stable branch is not really stable even though it reaches maintenance period.
(You can check the fixes in v4.3 branch that after service period, especially the wifi-library fixes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

4 participants