Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Clean up reconnection semantics #855

Open
5 tasks
couling opened this issue Jul 17, 2024 · 0 comments
Open
5 tasks

Feature request: Clean up reconnection semantics #855

couling opened this issue Jul 17, 2024 · 0 comments
Labels
Status: Available No one has claimed responsibility for resolving this issue.

Comments

@couling
Copy link

couling commented Jul 17, 2024

Feature Description

A segnificant proportion of MQTT3 and MQTT 5 is devoted to reconnection and maintaining a single session accross reconnects.

This whole area is done very badly by Paho with some MQTT5 improvements ignored and even a core MQTT3 feature broken.

For us, this has directly prevented adoption for over a year.

Problems

1) Clean Start cannot change between successive CONNECT messages

This API logic appears to have been copied from MQTT3.1 where "clean session" indicated in a single boolean flag:

  • start a new session when I connect
  • discard my session when I disconnect

MQTT5 split those into "clean start" and "session expiry interval". This has a clear intent that a client can start a new session on this connect that it indends to continue on a later one.

For a client to achieve this, it must send clean start 1 on the first CONNECT message and clean start 0 on subsiquent messages. But Paho's API doesn't seem to support this basic use case of the new MQTT5 split.

Forcing a client to specify clean start 0 on first CONNECT can cause it to inherit a session it knows nothing about, including subscriptions that it doesn't understand.

2) QOS2 "Exactly once" duplciates messages

QOS2 duplicates messages "deliberately" against the feature's core purpose. This feature is front and center of MQTT and even mentioned in the abstract:

"Exactly once", where messages are assured to arrive exactly once. This level could be used, for example, with billing systems where duplicate or lost messages could lead to incorrect charges being applied.

Also discussed in #733

3) QOS2 PUBLISH responses are not reported to the user

This makes it impossible to track what has actually been sent and what hasn't.

QOS2 PUBLISH failures are reported in PUBREC reason codes, but these are never passed to the user. What is reported to the user is PUBCOMP reason codes but these do not report any errors or useful information.

This has been ported from MQTT3.1 semantics. Under MQTT3.1 either PUBLISH or PUBREL could trigger onward transmission or processing (Described in MQTT3.1.1 section 4.3.3.

Under MQTT5, this explicit ambiguity has been deleted from the text (MQTT5 secion 4.3.3) with the strong inference that PUBLISH and not PUBREL is the right place to process a message.

A success / failure reason code has been added to PUBACK and PUBREC but not added to PUBCOMP. Giving PUBACK and PUBREC into the same purpose and relagating PUBCOMP to just releasing the message ID for recycling.

The current behaviour is to trigger on_publish() for PUBCOMP and PUBREC is discarded. This is liable to result in misleading 0x92 "Packet Identifier not found" or, even worse, an incorrect 0x00 "Success" (mosquitto sends this).

4) Session Expiry is burried and has a horrible default

As noted for (1) "clean session" has been split into "clean start" and "session expiry interval" but while "clean start has been added to the connect() method but the "Session Expiry Interval" is burried in the properties with a defaut to discard every connection.

5) Paho has never had a mechanism to re-load a session state

Many message brokers, including mosquitto, can persist their sessions accross restarts. While I'm not asking for Paho to start writing to disk, it would be great if Pahao offered someway to extract and re-load a session state so that it can be persisted somewhere.

Requested Solution

  • 1) Ideally "clean start" could be made automatic with 1 on first CONNECT without an existing session in Paho and 0 after that. Open to ideas on how to make this backward compatible.

  • 2) There will always be a situation where a session in Paho cannot be recovered on the server which, in light of (1) is easily identified as "clean session 0" with response "session found 0". The on_connect hook wouldn't let you do this, because of (1). A new callback or an ammendment to on_connect which would allow the user to detect this situation, know which messages are potentially forefit would be ideal.

  • 3) The reason code in MQTT5 must come from PUBREC not PUBCOMP, even if it wats for PUBCOM when reason code < 0x80. Ideally it should not send PUBREL and wait for PUBCOMP when PUBREC reason code was >= 0x80. When reason code < 0x80, waiting for PUBCOMP would be fine just as long as the reason code came from PUBREC (see No matching subscribers)

  • 4) This is minor, but it would be nice for "session expiry interval" to be bumped up to a first order argument on connect() along side clean_start. In reality "session expiry interval" could be used to infer a desire from the user to persist the session and automatically infer "clean session".

  • 5) This would be a completely new feature but simple enough to implement. For disconnected sessions only. A simple dump_state() and load_state() function which would both only work for disconnected sessions. They could return a simple dict containing:

    • Library version - where load_state() may chose to error on major version mismatches.
    • A list of queued publishes except QOS 0 including payload, current state of handshake
    • If internally recorded, all subscriptions

Alternatives

One Idea I've toyed with is suggesting a single "Reconnection Rules" structure which would encapsulate most of this. The advantage is that having a single argument with a datastructure makes the rest of the options backward compatible if "reconnection rules" is not set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Available No one has claimed responsibility for resolving this issue.
Projects
None yet
Development

No branches or pull requests

1 participant