-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootloader does not start properly after a watchdog reset/incoherent SPI state #24
Comments
Using InfiniTime 0.8.1 RC, I noticed that the bootloader won't run properly if the watchdog reset is triggered while the display is off (meaning that the cpu is in deep sleep, all devices are disabled,...). I don't know if this is caused by a bug in InfiniTime in sleep mode or in the bootloader... Anyway, I think the bootloader should be able to run no matter what happens in the application firmware! |
Looks like I need to do a full reset or something. Lemme do some research. How can I reproduce this? Once I can reproduce this, I'll isolate the problem...
Also is there some way for me to test whether SPI is in incoherent state? Once I detect that SPI is incoherent, I can do a full reset maybe. I can debug the bootloader without SPI by toggling the GPIO pins to change the screen brightness, so that it flashes various patterns to show the bootloader status. |
I'm seeing this intermittent problem with the SPI flash, maybe because it's powered down. Will investigate...
That's an assertion failure... int
spiflash_identify(struct spiflash_dev *dev)
{
int i;
int j;
uint8_t manufacturer = 0;
uint8_t memory_type = 0;
uint8_t capacity = 0;
int release_power_down_count = 0;
/* Table with unique release power down functions */
void (*rpd[3])(struct spiflash_dev *);
int rc = 0;
/* List of supported spi flash chips can be found in:
* hw/drivers/flash/spiflash/chips/sysconfig.yml
*/
static_assert((sizeof(supported_chips) / sizeof(supported_chips[0])) > 1,
"At lease one spiflash chip must be specified in sysconfig with SPIFLASH_<chipid>:1");
spiflash_lock(dev);
/* Only one chip specified, no need for search*/
if ((sizeof(supported_chips) / sizeof(supported_chips[0])) == 2) {
supported_chips[0].fc_release_power_down(dev);
spiflash_read_jedec_id(dev, &manufacturer, &memory_type, &capacity);
/* If BSP defined SpiFlash manufacturer or memory type does not
* match SpiFlash is most likely not connected, connected to
* different pins, or of different type.
* It is unlikely that flash depended packaged will work correctly.
*/
assert(manufacturer == supported_chips[0].fc_jedec_id.ji_manufacturer &&
memory_type == supported_chips[0].fc_jedec_id.ji_type &&
capacity == supported_chips[0].fc_jedec_id.ji_capacity); // Stops here |
On 0.8.0: When pressing the buttons to make it hang, we don't get the the assertion failure above. It just hangs. With the new bootloader that blinks, nothing blinks during the hang. When I run with gdb and force break during the hang, gdb says that device is stuck at address 0xc488 According to the .map file, 0xc488 is in
So it's hanging while writing to SPI? |
While the watch was hanging, I waited 1 minute, forced break in gdb. It stopped in a different location: 0x10f7a, which is in Pinetime::Drivers::TwiMaster::Write...
According to the call stack, the caller of Pinetime::Drivers::TwiMaster::Write is 0x11020, which is Pinetime::Drivers::TwiMaster::Wakeup...
|
Thanks for this torough analysis @lupyuen ! Here are some more details: In InfiniTime 0.8.0, there are race conditions on between the Display task (which writes on the SPI bus to the display and reads on the TWI bus from the touch controller) and the System task (which put the watch to sleep and wake it up when the button in pressed) : sometimes, system task disables the SPI and TWI devices while it is still in use by the Display task. When that happens, Display task waits for the end of a transaction, and that will never happen because the peripheral is disabled. When I debugged this issue, I noticed 2 behaviors:
In both cases, I'm pretty sure that the watchdog did reset the MCU, but nothing appears on the display, as if the bootloader was not running properly. That's why I created this issue. In your case when you waited for 1 minute before attaching GDB, I think you were in the first case: Display task is locked into an infinite loop waiting for the SPI/TWI peripheral. That is the bug I fixed in InfiniTime 0.8.1. Using InfiniTime 0.8.1, where this race condition is fixed, I noticed that the bootloader would not run properly when the watchdog reset occurs when the display is OFF (meaning that everything is disabled/put to sleep). I saw in the chat that you were wondering what happens when the button is pushed:
Sorry for this long post, but I doubt I'll be able to give it a lot of time before the week-end, but hopefully some of these information will help your analysis. If not, I'll do my best to help you during the evenings in my timezone or during the weekend :) Thanks again! |
Cool thanks and congrats :-) Here's what happened when I tried 0.8.1... You're right I don't see any more hanging in InfiniTime. The bootloader is now hanging because of the assertion failure in spiflash_identify I mentioned earlier. So lemme now fix this. The SPI bus is probably messed up, causing this assertion failure. If I bypass the assertion failure and call NVIC_SystemReset, it seems to fix the problem. Right now the assertion failure is stuck in an NMI Handler that loops forever. Lemme change the NMI Handler to call NVIC_SystemReset... |
OK NMI Handler now calls NVIC_SystemReset, seems to fix the hanging problem! The test binaries are here (with fixed manual firmware rollback checking): https://github.com/lupyuen/pinetime-rust-mynewt/releases/tag/v5.0.3 |
Wow, thanks for this fix, I'll try it asap! I admit that, at first, I was a bit surprised by your fix : override NMI_Handler to call NVIC_SystemReset(). But yeah, what else can you do? It looks like that many default implementation just enter into an infinite loop and does nothing (probably what happens here). I searched how the NRF SDK handles that case :
So, yeah, congratulations, you did as good as NRF engineers can do 👍 Now, I'm still wondering why the SPI bus does not work correctly after a watchdog reset... Reset should put all the registers in their default state and everything should work reliably after the reset... |
I've just done some tests :
Everything looks fine to me! I think we can do some other test (like OTA a not working firmware) to ensure that edge cases are covered too, but I won't have the time to do that this evening :) Thanks a lot for your analysis, work and fix, I think the bootloader is much more reliable now! |
Cool thanks for testing! I have merged to master branch and released it here: https://github.com/lupyuen/pinetime-rust-mynewt/releases/tag/v5.0.4 |
To review:
|
What about disabling SPI master on bootloader at early stage (before newt's hal_init_*)? SPIM.ENABLE register <-- 0 |
See InfiniTimeOrg/InfiniTime#79 (comment) In this case, example in 3.8.1 SPIM - TX workaround subsection of Nordic's Anomaly 109 for nRF52832 could be good. |
Sorry I'm not planning to make any more changes to MCUBoot, please submit a tested PR for updates. |
I'm working on this issue in InfiniTime : race conditions occurs during sleep/wakeup and something put the devices (spi) into an incoherent state (I think it is disabled while a transaction is running). This bug must be fixed in InfiniTime, of course.
When this bug happens, it looks like the watch is totally frozen, because the screen stays black. This is strange because the watchdog should reset the MCU and the bootloader should be able to start again.
I did some checks and i'm positive that the watchdog works as intended and resets the MCU. But... the bootloader does not start. I cannot debug it right now, but I guess it is also stuck somewhere because of this incoherent state of the SPI device (weird, because the MCU has just been reset...).
I'm not 100% sure of my analysis, but I think we really should have a look at this in order to ensure that the bootloader is as robust as possible.
The text was updated successfully, but these errors were encountered: