Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLE connectivity reliability analysis #207

Closed
vbelloir opened this issue Feb 12, 2021 · 29 comments
Closed

BLE connectivity reliability analysis #207

vbelloir opened this issue Feb 12, 2021 · 29 comments

Comments

@vbelloir
Copy link

vbelloir commented Feb 12, 2021

I create this issue to sumarize my work concerning the analysis of PineTime BLE connectivity.

DFU over BLE

Nordic Semiconductor provides a desktop tool, called nrfutil, that allows to generate dfu package and to perform DFU over several links (BLE, UART, USB...).
That's pretty cool, but it only works with modem DFU protocol, based on Google Protobuf. It doesn't work with Legacy DFU, which is used by PineTime.
Old nrfutil version ( under 0.5.2) can generate Legacy DFU package, but cannot perform DFU over BLE.
Adafruit nrfutil version used in PineTime mcuboot bootloader repo is based on that version.
So, I used dfu.py python script.

I create a small bash script that 10 DFU OTA, and create a log file with the following format:
try 8 result 100.0% in 6 minutes and 36 seconds

File should rename in dfu-test.sh, but github doesnt allow this extension.
dfu-test.txt

I used this script several times on my PimeTime sealed watch, to update it from 0.12.1 to 0.12.1.

Here is the lastest output file :

try 1 result 55.7% in 0
try 2 result 100.0% in 6 minutes and 38 seconds
try 3 result 100.0% in 6 minutes and 34 seconds
try 4 result 100.0% in 8 minutes and 34 seconds
try 5 result  in 0
try 6 result 100.0% in 7 minutes and 4 seconds
try 7 result 31.2% in 0
try 8 result 100.0% in 6 minutes and 36 seconds
try 9 result  in 0
try 10 result 19.8% in 0

As we can see:

  • 5 OTA were a success,
  • 2 OTA didn't even start,
  • 3 OTA failed between 19.8% to 55.7%

When OTA succeed, I started another one pressing a key. When it failed, I was forced to reset the watch before pressing key.

Script should propably be improved!

Crash with connexion/disconnexion

Purpose of this test is to evaluate the ability of Pinetime (on Infinitime) to perform BLE connexion/deconnexion.

This test is realize with the following bash script (should be rename gatt-test.sh)
gatt-test.txt

It is based on gattool, provided by bluez, and allow to:

  • connect to the device,
  • read battery level characteristic,
  • disconnect from the device.

It can be called like this:
./gatt-test.sh -m D9:DD:1F:7E:64:12 -t 50 -i 120
-m: define device mac address
-t: define number of loop iteration
-i: define interval in second between iteration

It generates a report file called gatt_YYYMMDD_HHMM.log, containing:

Infinitime fw verson:
0.12.42
mac address: FF:AE:EC:DE:38:D3
tries: 50
interval: 120 s
attempt 1 batterry level 46
attempt 2 batterry level 46
attempt 3 batterry level 46
....

if an error occurs, it appears in log file:

attempt 3 batterry level 0 error 16
attempt 4 batterry level 0 error 111

where 16 and 111 are error codes.
During my tests, I saw those errosrs:

  • 38: function not implement
  • 111: Connection refused
  • 16: Device or resource busy

I used the test on a PR release 0.12.42 and get this log result: 
[gatt_20210213_1840.log](https://github.com/JF002/Pinetime/files/5976517/gatt_20210213_1840.log)

No error in 50 connexions/deconnexions.




@PetersOtto
Copy link

As I got my sealed PineTime, I updated it from 0.7 to 0.9 OTA with gadgetbridge. I got the an error at updating. At the next try I touched / scroll on screen every 15% update progress. Since I do so, I never had an error at updating.

Perhaps just luck :)

@vbelloir
Copy link
Author

vbelloir commented Feb 13, 2021

Tonight, I updated first post with a procedure to test connexion/disconnexion.

@JF002 I also performed a test on DFU on your release 0.12.1-testble on my devkit.
To be sure that my test is representative, i performed 10 DFU update from 0.12.1 to 0.12.1, and then, made de same with 0.12.1-testble to 0.12.1-testble.

Result for 0.12.1 is there
test_20210213_2102.log
6/10

Result for 0.12.1-testble is there
test_20210213_2219.log
9/10
The failed attempt may be an error from myself, I'm not sure.

@JF002
Copy link
Collaborator

JF002 commented Feb 14, 2021

Thanks @vbelloir for opening this issue, we'll try to gather all the info about these ble issues here 👍
And also thanks for this test script! It's probably faster than manually connecting and disconnecting from the watch using a smartphone!

These last few days, I did many tests :

  • Quickly connect/disconnect my phone to the watch
  • Bomb the watch with notification
  • Walk around the house to go out of range, then come back near the watch, reconnect, and so on
  • Add many logs in the code, spy the ble communiations with a sniffer, enable the 'monitor' mode of nimble to get HCI messages over RTT.

Aaaand... nothing. I couldn't reproduce the issue, saw nothing wrong in the logs,...

Then, the other day, I tried something else : connect the phone to the watch, disconnect the phone from wifi/data so that it does not receive any notification and go to sleep. In the morning : the watch was disconnected and couldn't reconnect (the ble logo was still displayed even if it was not connected).

I could reproduce that on my devboard connected to a logic analyzer and noticed that the low-level ble task (ble_ll) just stopped running. With further analysis, I discovered that the RADIO IRQ was not triggered at all!
That reminded me of an discussion on the developer mailing list of apache mynewt/nimble I had a few month ago (http://mail-archives.apache.org/mod_mbox/mynewt-dev/202006.mbox/%3cCADAkoF5Os4x9A=q7UevuZ-vXJty365wa=oezRqPGvwGfD=t67g@mail.gmail.com%3e) where Andrzej told me that the way the interrupts were handled in InfiniTime might not be good (PRIMASK vs BASEPRI). At that point, I changed 2 macros to use PRIMASK instead of BASEPRI and it improved the ble connection.

I had a look at the code and found out that 2 other macro were still using primask. I also changed them and re-did my the test : no connection drop over more than 10h!
I also run ~10 OTA without any issue.

I'll continue the testing today and will probably release this fix!

@JF002
Copy link
Collaborator

JF002 commented Feb 14, 2021

InfiniTime 0.13.0 released :)

@Avamander
Copy link
Collaborator

Related issue: #70

@ashkitten
Copy link

ashkitten commented Feb 18, 2021

infinitime 0.13.0 helps a lot with this issue for me. the pinetime now never shows itself as connected when it's not. however, after a while (a day or so) gadgetbridge disconnects and will not reconnect unless i reboot the watch, which requires me to remove the device from gadgetbridce and re-add it if i chose to pair it.

i'm on android 10 (omnirom, moto one hyper) with gadgetbridge 0.53.0 from f-droid.

@petterhs
Copy link
Contributor

I'm having same issues as ashkitten

@JF002
Copy link
Collaborator

JF002 commented Feb 21, 2021

I did the same observations as @ashkitten and @petterhs : BLE connectivity has improved in 0.13, and never shows connected when it's not. But I also have to reset the watch ~ once a day. Most of the time, when I wake up, my pinetime is disconnected from the phone and cannot reconnect. I noticed that it doesn't advertise anymore.

Once it's rebooted, Gadgetbridge reconnects automatically, it doesn't need to remove/pair with the watch again, though.

@stephanlachnit
Copy link

Just for reference, I have the same problem. For me resetting the watch is mot enough, I even need to reboot my phone to connect it again. Also, on the first connect to Gadgetbridge, the time isn't set, I need to reconnect it for that.

@davorvr
Copy link

davorvr commented Apr 15, 2021

I have a similar problem with disconnects on 0.15, but reconnecting in GadgetBridge works fine. As people have previously mentioned, the disconnects don't start to happen until after a day or so has passed with me too. Connection is very reliable within the first 12-24 hours, and then starts to break.

@pfeerick
Copy link
Contributor

pfeerick commented Apr 15, 2021

For completeness, I experience this also, with no advertising or bluetooth connection at around 20h09m or a bit longer with 0.16 dev build. Upon reset (>8 second button long press) it immediately starts advertising, and Gadgetbridge will reconnect automatically if set to do so. As there was some mention on discord it seemed more stable if the pinetime was left in the charge cradle, I've just reset one and will edit this comment in 24-48 hours with the results, but I don't have high hopes.

Edit: I forgot to update this comment when I said I would, but having the watch on charge did not effect this at all. It still dropped off after ~20 hours.

@muppeth
Copy link

muppeth commented Apr 22, 2021

Not going to be very original ,but perhaps adding that when using combination pinephone+phosh(danctnix and amazfish) getting ton of connection issues, where I need to re-pair the watch few times a day. Not sure who's to blame (amazfish or infinitime) yet though, but in case anyone wants me to try things let me know.

@vbelloir
Copy link
Author

With XA2, SFOS, Amazfish, INfinitime, I sometimes need to reboot the wtach to get connexion back.

@Man-with-Arrow
Copy link

Man-with-Arrow commented May 8, 2021

I'm also experiencing this behavior (watch eventually disconnects, requires reboot and re-pair via Gadgetbridge) on 1.0.
In Gadgetbridge settings, enabling "Reconnect automatically" seems to help a bit but does not negate the issue.

@JF002
Copy link
Collaborator

JF002 commented May 9, 2021

Thanks for all these feedbacks 👍 Someone will definitely have to analyze deep into the BLE integration in InfiniTime and NimBLE to find out why the connection is not as stable as it should be...

@blutter
Copy link

blutter commented Jul 19, 2021

Just reading about everyone's experience with rebooting the watch to reconnect to their phones... I have found a way that occasionally works to reconnect my Pinetime/phone without rebooting either.

I have found that I can sometimes reconnect Gadgetbridge by starting the nRFConnect app, turning location off then back on, scanning in nRFConnect, then connecting in nRFConnect. Gadgetbridge will connect once the watch is connected in nRFConnect.

Wondering if this works for anyone else.

I'm using a build based in 1.2.0 (commit 883700f)

@Avamander
Copy link
Collaborator

I've mentioned this in the chat as well, but those instructions sound to me like you've just actually simply restarted your phone's Bluetooth.

Location can also keep Bluetooth silently enabled if you've enabled "Enchanced location" or whatever it was, it uses BLE beacons for positioning. There are some chipsets out there have WiFi and BT tied together, meaning that you'd have to turn both off for it to actually turn off, so when you tried a bunch of stuff that might've done it as well.

@JF002
Copy link
Collaborator

JF002 commented Jul 30, 2021

If someone wants to help analyzing these issues, I wrote an article a while ago that describes my BLE analysis setup (BLE sniffer, HCI logs, logic analyzer,...).

It's not easy to pinpoint exactly what's going on when the connection drops as this issue occurs after 20-24h on my setup.

@ajostergaard
Copy link

Somebody somewhere mentioned that they use https://github.com/atc1441/ATCwatch on their PineTime and Bluetooth is rock solid.

Any value in somebody who understands these things taking a closer look at how they did it?

I'd look myself but I'm a little less hardcore than you hardware rockstars.

@JF002
Copy link
Collaborator

JF002 commented Jul 31, 2021

I think ATCWatch uses the NRF Softdevice as BLE stack. It's the BLE stack developed by NRF (manufacturer of the NRF52) which is only available as a binary blob (closed source).

InfiniTime, on the other hand, is based on NimBLE, which is an open source BLE stack that supports the NRF52 MCU.

So, yes, having a look could have been a good idea, but without access to the source of its BLE stack, I'm not sure we'll be able to find anything relevant...


I let my devkit run for the whole day and night yesterday, with a BLE sniffer running just next to it:

  • BLE sniffer saw nothing : before going to bed, it was detecting all the ADVERTISING messages (during 3 minutes after I push on the button of the watch), and when I woke up, nothing appeared on the sniffer.
  • I connected my debugger to have a look at the BLE tasks (host and LL) : both of them were just waiting for a new event in their event queue. It means that the tasks where not stuck on a deadlock or something similar.

Next step is trying to analyze the RADIO IRQ and see if it stops at some point...

@vbelloir
Copy link
Author

Really difficult to investigate here, because there are lots of configurations and parameter available.
For example, I'm using Infintime 1.3.0, with a Sony XA2 phone, running SFOS 4.1, and amazfish app as companion app.
My phone is in my pocket, connected to the watch, and realy often (every half hour?), watch receives a notification that amazfish is now connected, which could mean that connection was broken?

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

@Avamander
Copy link
Collaborator

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

I'd say yes.

Next step is trying to analyze the RADIO IRQ and see if it stops at some point...

@JF002 Have you considered implementing secure BLE bonding? It's very likely that different rules apply to secure long-term connections and reconnection.

@JF002
Copy link
Collaborator

JF002 commented Jul 31, 2021

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

I'd say yes.

:D
I have no better answer for you, unfortunately... One of these, some, all? I don't know :/

@vbelloir On my Android phone (Huwaei P10, I think), connection is quite stable with Gadgetbridge, my "only" issue is that all BLE connectivity stops after ~24h. Or maybe I also have this issue but don't notice it because gadgetbridge does not send notification on connection...
The issue you're describing is a bit different : it disconnects/reconnects every now and then That's probably another BLE related issue someone will have to investigate.
Maybe try to build InfiniTime with the monitor mode enable (see this article and see what the HCI logs show (you need a JLink debugger for the RTT).

@JF002 Have you considered implementing secure BLE bonding? It's very likely that different rules apply to secure long-term connections and reconnection.

Probably in the future, yes, but I would have to learn how bonding and security work on BLE first ;)
But I thought it would be a better idea to try to fix current issue before adding more complexity. Maybe I'm wrong and these features could improve the reliability? I don't know :/


For science, I've just integrated nimble 1.4 master branch into InfiniTime, I'll run it until tomorrow to see what happens. If anyone wants to try it, here's a DFU.
NOTE Flash this file only on a devkit and if you have a SWD debugger on hands, it's absolutely not tested!
NOTE The BLE address is hard-coded to 00:00:00:00:00:08 because the address detection on the RADIO failed for some reason.
pinetime-mcuboot-app-dfu-1.3.99.zip

In the meantime, I've just sent a mail on the mynewt mailing list to see if anyone can help on this topic : http://mail-archives.apache.org/mod_mbox/mynewt-dev/202107.mbox/browser

@Avamander
Copy link
Collaborator

@JF002

Probably in the future, yes, but I would have to learn how bonding and security work on BLE first ;)

There's a few links here https://wiki.pine64.org/wiki/PineTime_Bluetooth but maybe you'll find some more cool documentation about it.

@JF002
Copy link
Collaborator

JF002 commented Aug 1, 2021

Nimble 1.4 does not fix my issue : ble is not working after 19h :/

@JF002
Copy link
Collaborator

JF002 commented Aug 2, 2021

New test last night : enable the logging feature of NimBLE : there was no error at all. It just logged that advertising was started and then stopped after 3 minutes, even if no data was transmitted in the air...

Now probing 3 radio signals as mentioned here. To be continued tomorrow :)

@Crsarmv7l
Copy link

Crsarmv7l commented Aug 5, 2021

Really difficult to investigate here, because there are lots of configurations and parameter available.
For example, I'm using Infintime 1.3.0, with a Sony XA2 phone, running SFOS 4.1, and amazfish app as companion app.
My phone is in my pocket, connected to the watch, and realy often (every half hour?), watch receives a notification that amazfish is now connected, which could mean that connection was broken?

who is responsible? Pinteime HW? Infinitime? Sony XA2 HW? SFOS BLE stack? amazfish?

Late response, but I also have the exact connect/disconnect issue you have with Amazfish, Infinitime 1.3 and Sailfish 3.4 on an fxtec pro1. I have tried using mcetool to utilize early suspend (not late which shuts off a bunch of stuff for battery saving) and while it would connect more, I still had frequent drops.

Seeing as JF002 and others have less frequent disconnects, but still have them, it appears the BLE issue is either exacerbated on Amazfish, or we also have an issue with Amazfish/SFOS that is coming into play. I would imagine Adam Piggz wouldn't be interested in investigating a potential Amazfish issue until Infinitime's BLE is fully stable.

@JF002
Copy link
Collaborator

JF002 commented Sep 30, 2021

InfiniTime 1.5 and 1.6 brought many improvements the BLE connectivity. While I doubt those changes fix all connectivity issues, they certainly fix most of them. I'll close this issue, but feel free to open a new one if you encounter another BLE issue on InfiniTime 1.6 (and if it doesn't exist yet).

Thanks again to everyone who help debugging those issues!

@JF002 JF002 closed this as completed Sep 30, 2021
@tterrag1098
Copy link

That's great news! Thanks so much to all the team and especially @danielgjackson for all the long hours debugging this ridiculously tricky bug. Sometimes these obscure issues end up being obvious once you find them, but not the case here, I can see why this was missed for so long! I'll definitely be wearing my watch more regularly now that I don't have to fiddle with the bluetooth every morning 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests