Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QoS of non-retained messages #247

Open
Tieske opened this issue Jan 18, 2023 · 3 comments
Open

QoS of non-retained messages #247

Tieske opened this issue Jan 18, 2023 · 3 comments

Comments

@Tieske
Copy link
Contributor

Tieske commented Jan 18, 2023

See discussion here: 4a6ea00 (#200) and documentation here.
The argument for not using QoS 2 (Exactly once) is that not all devices have persistent storage available.

But imo that argument is moot because QoS 1 (minimum once) delivery also requires persistent storage. The paragrpah on retry (4.4) specifically mentions resending and hence storage requirements:

When a Client reconnects with CleanSession set to 0, both the Client and Server MUST re-send any unacknowledged PUBLISH Packets (where QoS > 0) and PUBREL Packets using their original Packet Identifiers

The second argument: 4a6ea00 (#200) that the $state message might arrive before the value of a property.

If the property is retained; then we do not care, because all is QoS 1 and order is preserved.

If the property is non-retained; then it's an event, in which case the property doesn't have a "value" or a "state". It is merely a notification. So the controller not having received the value before the device reaches ready state, is a normal operating condition.

This essentially is a race-condition only if at the very moment of switching to ready an actual event happens at the device end.

On controller side if;

  • $state first, and value right after: to the controller the event never happened, because the device wasn't on line. Would have been the same if the device was powered on milliseconds later, after the event happened in the first place.
  • value first, $state right after: to the controller this is a normal event it can handle.

So combining those; my impression is that QoS 2 (Exactly once) is just fine for non-retained properties.

The above is based on my knowledge so far and reading up on the actual QoS flows.

Or did I miss something? Can you verify @Thalhammer ?

@schaze
Copy link
Contributor

schaze commented Jan 18, 2023

For the first argument:

But imo that argument is moot because QoS 1 (minimum once) delivery also requires persistent storage.

Not sure if it needs to be persistent storage but at least in memory for the duration of the network connection (see chapter 4.1 Storing state - http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc398718105)

I think QoS 2 should be used everywhere, otherwise there could be scenarios where you might get multiple partial repetitions of messages:

see chapter 4.6 http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc398718105 -->

"The rules listed above ensure that when a stream of messages is published and subscribed to with QoS 1, the final copy of each message received by the subscribers will be in the order that they were originally published in, but the possibility of message duplication could result in a re-send of an earlier message being received after one of its successor messages. For example a publisher might send messages in the order 1,2,3,4 and the subscriber might receive them in the order 1,2,3,2,3,4.")

This would be a nightmare to handle, e.g. lights on, off, on, off, on.

2nd argument:

If the property is retained; then we do not care, because all is QoS 1 and order is preserved.

The order is only guaranteed/required for messages for a single topic not all messages overall. (See chapter 4.6 Message ordering on http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc398718105). This means there might be broker implementations that do not deliver messages in the order they are sent out (for different topics).
In such cases a controller might miss the first emissions after the state switches to ready if they arrive before. I also do believe that this is much more relevant for structural changes in devices. (e.g. adding a node dynamically after a config change in the device requiring a "rediscovery").

@Tieske
Copy link
Contributor Author

Tieske commented Jan 19, 2023

Device: non-retained property values MUST be published with QoS=0. Since they are events. So they are time bound. QoS=0 will NOT queue messages for disconnected clients. Using QoS=1 or QoS=2 for those would cause the broker to queue the messages and deliver them at a later point in time (to disconnected clients), which is not the right thing to do This explains this sentence in the spec;

to ensure that events don't arrive late or multiple times.

Controller: non-retained commands should also not be queued. If the command is "brew-coffee", you need it now. You do not want the machine to restore its connection 45 mins from now, and then suddenly start acting on an old command.

I both cases the delivery of the event/command is less reliable. But that is only when a transmission gets interrupted. Which is unlikely. The only way around this I can see, is if the message format is changed into a complex structure where a "timeout" or "validity period" can be added along with the actual payload. Such that the receiver can judge if the message is still valid.

As for the order of receiving; $state and property topic beings different, order cannot be guaranteed then, independent of QoS being the same or not. So I think;

  • A device, when going online, should wait X milliseconds after posting ready before sending events (setting values on non-retained properties)
  • A controller, should hold on to publishing to set topics on non-retained properties, until the device was ready for at least X milliseconds.

@Tieske
Copy link
Contributor Author

Tieske commented Jan 21, 2023

I've given this some more thought. My conclusion;

  • the default being QoS 1 or QoS 2 should both be fine. (but once decided which to use, the entire homie network should use the same setting, or it should be made configurable; a user setting on a homie topic for all to discover, but that would over complicate things imo)
  • non-retained properties are limited to QoS 1, by nature of their event-based design.

I have however also come to realize that the non-retained topics are weird in general.

  • homie is essentially state based, but the non-retained properties introduce events, as well as "command" structures.
  • non-retained properties have no way of making their published events discoverable (could be alleviated with profiles, but ideally the devices carries it own description)
  • non-retained properties have no way of making their accepted commands (and possible parameters to those commands) discoverable (could be alleviated with profiles, but ideally the devices carries it own description)
  • why are events and command tied to a "property" ? there is no logic to that imo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants