Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SX126X: Preamble Length #14449

Open
chrissnow opened this issue Mar 19, 2021 · 33 comments
Open

SX126X: Preamble Length #14449

chrissnow opened this issue Mar 19, 2021 · 33 comments

Comments

@chrissnow
Copy link
Contributor

chrissnow commented Mar 19, 2021

Description of defect

We are using a STM32L151CC on a custom PCB, with a SX1262 radio, its roughly based on a Nucleo-L152RE board + SX1262MB2xAS design.

Everything works reliably until we enable tickless at which point join and downlinks become unreliable, probably <50% success rate.
The network is receiving and replying so it must be a timing problem, likely the RX1 and RX2 slots are not timed well enough.

I appreciate it's a bit custom but the underlying fault is within Mbed somewhere, and likely affects other targets.

I will try and reproduce it on a "normal" target.

Were already well behind schedule on the project so any help would be greatly appreciated!

 Sending confirmed message.
[INFO][LMAC]: RTS = 6 bytes, PEND = 0, Port: 15
[DBG ][LMAC]: Frame prepared to send at port 15
[DBG ][LMAC]: TX: Channel=65, TX DR=4, RX1 DR=13

 6 bytes scheduled for transmission
[DBG ][LSTK]: Transmission completed
[DBG ][LSTK]: Awaiting ACK
[DBG ][LMAC]: RX1 slot open, Freq = 923900000
[DBG ][LMAC]: RX2 slot open, Freq = 923300000
[DBG ][LMAC]: ACK_TIMEOUT Elapses, Retrying ...
[DBG ][LMAC]: Trading datarate for range
[DBG ][LMAC]: TX: Channel=13, TX DR=3, RX1 DR=13
[DBG ][LSTK]: Transmission completed
[DBG ][LMAC]: RX1 slot open, Freq = 926300000
[DBG ][LMAC]: RX2 slot open, Freq = 923300000
[DBG ][LMAC]: ACK_TIMEOUT Elapses, Retrying ...
[DBG ][LMAC]: TX: Channel=11, TX DR=3, RX1 DR=13
[DBG ][LSTK]: Transmission completed
[DBG ][LMAC]: RX1 slot open, Freq = 925100000
[DBG ][LMAC]: RX2 slot open, Freq = 923300000
[DBG ][LMAC]: ACK_TIMEOUT Elapses, Retrying ...
[DBG ][LMAC]: Trading datarate for range
[DBG ][LMAC]: TX: Channel=8, TX DR=2, RX1 DR=12
[DBG ][LSTK]: Transmission completed
[DBG ][LMAC]: RX1 slot open, Freq = 923300000
[DBG ][LMAC]: RX2 slot open, Freq = 923300000
[ERR ][LSTK]: Retries exhausted for Class A device

Target(s) affected by this defect ?

STM32L151CC

Toolchain(s) (name and version) displaying this defect ?

ARMC6

What version of Mbed-os are you using (tag or sha) ?

mbed-os-6.9.0
Though we had the same issue with 6.5.0 too

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

mbed-cli 1.10.5

How is this defect reproduced ?

We are working out the easiest way for someone else to reproduce it, probably a Nucleo-L152RE board + SX1262MB2xAS shield.

An xDot might have the same problem but has a different radio.

We have this in our mbed_app

            "target.macros_add": ["MBED_TICKLESS=1"],
            "events.use-lowpower-timer-ticker": true,
@chrissnow
Copy link
Contributor Author

We have confirmed that Nucleo-L152RE board + SX1262MB2xAS shield has the same problem

using
mbed-os-example-lorawan

"target.macros_add": ["MBED_TICKLESS=1"],
"events.use-lowpower-timer-ticker": true,

@jeromecoutant
Copy link
Collaborator

We agree that enabling TICKLESS with STM32L1 is not recommended.

@chrissnow
Copy link
Contributor Author

We agree that enabling TICKLESS with STM32L1 is not recommended.

That's rather bad news for us, any particular reason and anything we can do make it work better? it seems to nearly work..

@chrissnow
Copy link
Contributor Author

@jeromecoutant Done a bit more testing with a WL55JC1 and enabling tickless on that also breaks things, becomes very unreliable to join and downlink.

I'd hope that tickless should work on a WL55? The fault seems common across multiple families.

@jeromecoutant
Copy link
Collaborator

@chrissnow
Copy link
Contributor Author

Interesting, perhaps it's
"events.use-lowpower-timer-ticker": true
causing the trouble then.

I will try without it.

@ciarmcom
Copy link
Member

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-3696

@chrissnow
Copy link
Contributor Author

Something very odd going on here..
I thought I had the WL55 working well, but I'm not convinced anymore.

At the moment tickless or not I can't get the WL55 reliable, my only change to the LoRaWAN example is to add some keys and make each downlink confirmed (every 10 seconds)

Our custom target seems more reliable without tickless, but I'm not certain of it.

Still testing and will report back what I can, it may not be STM related in the end but it's the only targets I can easily run LoRaWAN on.

@chrissnow
Copy link
Contributor Author

chrissnow commented Mar 21, 2021

It seems that #11502 is the true cause, tickless or not the WL55 is unusable without
"lora.max-sys-rx-error": 200
Which is really rather wasteful energy wise. I can't believe that we need to give 200ms either side of the expected RX window to be able to reliably get a downlink.

@jeromecoutant @0xc0170 I can spend some time on this early next week but I'm not really sure how best to debug it, or exactly how it's meant to work in the first place...

I'm not sure if this is STM32 specific at the moment.

Without fixing this LoRaWAN support in Mbed is pretty much unusable, The workaround isn't really suitable for production.

@jeromecoutant
Copy link
Collaborator

the WL55 is not unusable without
"lora.max-sys-rx-error": 200

@ludoch-stm
Maybe you could have some idea ?
Thx

@chrissnow
Copy link
Contributor Author

Having thought about this a bit more if #11502 is correct regarding it being SF dependant it's perhaps not the timing of the RX window opening that's the problem, given it's the same for all SF, however what is different is how long to leave it in RX (or wait for it to complete), could the stack be giving up mid way through successfully receiving the data?

I will try and get some timing data off a logic analyser.

@chrissnow
Copy link
Contributor Author

@ludoch-stm Apologies to chase, Any help would be greatly appreciated, 2 days of debugging and not made any progress :-(

@chrissnow
Copy link
Contributor Author

I have made some progress in debugging the problem.

Build configurations, all with tickless +use-lowpower-timer-ticker, though without makes no difference.

xDot_L151CC, internal SX1272, works perfectly.
NUCLEO_L152RE + SX1272MB2xAS, works perfectly.
NUCLEO_F446RE + SX1272MB2xAS, works perfectly.

NUCLEO_L152RE + SX126xMB2xAS, SF7 rarely works. SF8 & SF9 works sometimes, SF10 is reliable, increasing max-sys-rx-error makes things worse.
NUCLEO_WL55JC1, similar to L152, but max-sys-rx-error 200 makes it reliable.
NUCLEO_F446RE + SX126xMB2xAS, works perfectly.

Based on this I think there are multiple issues.
Something is different between how both radios work.
something is wrong with the WL55.

@chrissnow
Copy link
Contributor Author

mbed app attached for how I tested it, you will need to add keys, only change to the example is to send confirmed.

    retcode = lorawan.send(MBED_CONF_LORA_APP_PORT, tx_buffer, packet_len,
                           MSG_CONFIRMED_FLAG);

mbed_app.txt

@chrissnow
Copy link
Contributor Author

chrissnow commented Mar 24, 2021

@0xc0170 are you able to get any support from whoever is responsible for the LoRa drivers?

@chrissnow chrissnow changed the title STM32L1: Tickless LoRaWAN SX126X: LoRaWAN multiple problems. Mar 24, 2021
@ludoch-stm
Copy link
Contributor

Hi Chris,

This rxerror parameter is a very sensitive parameter which is dependent of radio shield and affects the RX timing window opening, as you said previously.
To understand its effect, you can find attached the drawing concerning the Window Timeout and Window Offset definitions.
If it’s configured to a too high value, RX window could overlap Tx and/or RX2 windows, leading to unexpected behavior.
I see you are using a STM32WL with US regional parameters and SF7 to SF10 configs.
Could you describe your setup if I miss some other configs?
On mbed-OS, STM32WL has been validated with max-sys-rx-error = 5, and on STM32CubeWL package, it is validated with value=10 in the LoRa stack. So, setting this value equal to 200 seems really high.

The issue could come from several causes:

  • another task could be stalling TX in mBed-OS which shift its timing
  • the Opening window of your Gateway isn’t synchronized with device
    Did you have the chance to check TX, RX opening windows on logical analyzer? Can you share them?
    SystemRxErrorParam.pptx

@chrissnow
Copy link
Contributor Author

Hi,

My early findings might be a bit confusing, I will try and clear a few things up as we have been testing multiple regions, and multiple targets and radios.

Let's simplify it a bit!

WL55JC
mbed-os-example-lorawan
EU868 TTN as network

If I build and change the messages to always be confirmed once the SF lowers the downlinks are no longer received.
However max-sys-rx-error = 10 is enough to make the WL55 reliable

So that is an easy fix for the WL55.

STM32L1

We have a custom board, that is in production but waiting on a firmware release (WL55JC didn't exist at the time, we will move to it later in the year)
I will try finer increments of max-sys-rx-error and see if I can get it to work.
But the odd thing is the SX1272 works perfectly, which really only leaves the SX126X driver since the timing is done outside that.

We see a timeout IRQ even when we have a large max-sys-rx-error, which I think is because despite the timeout in the RX command being set to forever it will still timeout on a number of symbol times?

I will get some logic traces and report back.

Thanks for that doc, explained much better than the other docs I have.

@chrissnow
Copy link
Contributor Author

@ludoch-stm I have now narrowed the problem down further.
max-sys-rx-error = 10 is enough to make the WL55 reliable on EU868
However it is not reliable on US915, Just building it at 20 to see if it helps.

Have you validated US915 or just EU868?

@chrissnow
Copy link
Contributor Author

More progress...
Seems to be related to
MBED_CONF_LORA_DOWNLINK_PREAMBLE_LENGTH
Which defaults to 5, However
"lora.downlink-preamble-length": 9
Fixes US915
8 also seems fine. Not entirely sure on the correct number though..

Things I have found suspicious
image

"downlink-preamble-length": {
"help": "Number of whole preamble symbols needed to have a firm lock on the signal.",
"value": 5

* @param preamble_len Sets the Preamble length ( LoRa only )
* FSK : N/A ( set to 0 )
* LoRa: Length in symbols ( the hardware adds 4 more symbols )

I wonder if that comment is true for the SX127X but not the SX126X? I haven't seen reference to this in the datasheet.

This change hasn't broken EU868 either.

@ludoch-stm
Copy link
Contributor

As the issue is present in US915 band, did you check that your configuration is in Hybrid mode?

To do so, you should configure in mbed_config.h:
#define MBED_CONF_LORA_FSB_MASK {0x00FF, 0x0000, 0x0000, 0x0000, 0x0001}

Also, what's your Gateway number of channel: 8 or 64?

@chrissnow
Copy link
Contributor Author

chrissnow commented Mar 25, 2021

Were using FSB2 hybrid so channel 8-15+65, 8 channel gateway.
We have an FSB mask to match that, but use OTAA so the NS dictates past the join.
The frequencies all look correct during operation with tracing enabled.

This seems to work well for us, WL55 or NUCLEO_L152RE + SX126xMB2xAS

"lora.max-sys-rx-error": 10,
"events.use-lowpower-timer-ticker": true,
"target.macros_add": ["MBED_TICKLESS=1"],
"lora.downlink-preamble-length": 9

@ludoch-stm
Copy link
Contributor

OK, good news if it works now in your environment!
Perhaps the topic of the conversation can be changed then :-)

@chrissnow chrissnow changed the title SX126X: LoRaWAN multiple problems. SX126X: Preamble Length Mar 26, 2021
@chrissnow
Copy link
Contributor Author

Done,

Are you going to handle the WL55 max-sys-rx-error needing to be 10?
Any thoughts on the correct preamble length?

@adbridge
Copy link
Contributor

@chrissnow @jeromecoutant can this be closed after the merging of 14481 ?

@jeromecoutant
Copy link
Collaborator

I would say yes...

@chrissnow
Copy link
Contributor Author

The WL55 behaves with #14481 but not other targets, I'm pretty sure the default preamble is wrong for all SX126X targets in US915, really needs some confirmation from someone who knows more about LoRa than me...

@hallard
Copy link
Contributor

hallard commented Oct 23, 2023

@chrissnow Thanks I'm not alone with this issue, can you confirm on STM32WL that only
"lora.downlink-preamble-length": 9 is needed since #14481 fix rx-error ?

@chrissnow
Copy link
Contributor Author

@hallard It's been a few years but yes I think so.

@hallard
Copy link
Contributor

hallard commented Oct 23, 2023

Thanks will report back we deployed a lot on EU but it's our first try in US and this downlink issues on STM32WL drove us mad

@hallard
Copy link
Contributor

hallard commented Oct 23, 2023

Just done some tests and looked into the code, 2 preambules length on MBED one for uplink other for downlink

looking at CMakeLists.txt for all frequencies is as follow

MBED_CONF_LORA_DOWNLINK_PREAMBLE_LENGTH=5
MBED_CONF_LORA_UPLINK_PREAMBLE_LENGTH=8

So uplink follow specification with 8 but not downlink with 5

Default compiled program for STM32WL state

So my guess that as state @chrissnow lora.downlink-preamble-lengthshould be aligned with uplink and set to 8

we flashed 5 devices in US with this new setting, and got their downlink first time like a charm, before some never get it.

I can do a PR, @jeromecoutant let me know if I'm doing it on all devices or just on STM32WL in /connectivity/lorawan/mbed_lib.json

@jeromecoutant
Copy link
Collaborator

I can do a PR, @jeromecoutant let me know if I'm doing it on all devices or just on STM32WL in /connectivity/lorawan/mbed_lib.json

If you have verified then DL new value only with STM32WL, maybe it is safer to update only STM32WL ?

@hallard
Copy link
Contributor

hallard commented Oct 24, 2023

Agree safer, even if I'm pretty sure it will improve downlink for other. Would really like to understand why this value was set to 5 instead of 8 as specification, there is for sure a reason that we ignore.

@hallard
Copy link
Contributor

hallard commented Oct 24, 2023

and here we go #15459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants