Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] using f'format strings in tight loop blocks and triggers watchdog timer #1668

Closed
BertLindeman opened this issue Jun 6, 2024 · 32 comments
Labels
bug Something isn't working software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)

Comments

@BertLindeman
Copy link

Describe the bug
Program is a tight loop printing f-strings. Within seconds the hub does a shutdown (somehow)
The display animation is no longer smooth but has large hickups.

Might this be related to #436 ?

To reproduce
Steps to reproduce the behavior:

  • connect the hub to USB for power
  1. Running in code.pybricks.com or beta
  2. Current setup is RobotInventor at 3.5.0 and SpikePrime at build 3380 (BT-button change)
  3. connect to the hub
  4. start the program (see below) via the app or website
  5. stop the program via the app and disconnect the hub
  6. stop the program by long-press on the hub button
  7. start the hub and run the program again (button press) but do not connect to the app
  8. a bit of patience (seconds) and see the hub goes OFF

Aditional info
Connected the hub before the test to USB to keep the power on.
After the crash connect to the hub and go to the REPL.
Showed:

>>> hub.system.reset_reason()
2
>>> 

Expected behavior
You'd expect the program to just keep running and not to power-off.

Screenshots
There is a saying that a picture is worth a 1000 words. Screenshots really help to identify and solve problems.

test program for this issue
# issues try_movehub_on_prime_6
# start this and while running disconnect with the app
# stop the hub to save the current program
# restart the hub (if build 3380 installed press the BT button -> red)
# start the program with the button, and
# within seconds the running square stops and the hub shutsdown


from pybricks.hubs import PrimeHub
from pybricks.tools import wait

HUB = PrimeHub()

# Initialize variables.
color_list = ["", "", "", "", "", "", ""]

_rand = HUB.battery.current() + HUB.battery.voltage()
# _rand = 123
INDEX = 4
A = 0
B = 5


def main2():
    while True:
        print(f'pre _rand {_rand:>5} A {A:>2} B {B:>2}', end=" ")
        print(f'_rand {_rand:>5} A {A:>2} B {B:>2}')
        print(f'INDEX {INDEX:>3} color_list[{INDEX}] {color_list[INDEX]}')
        # wait(1)  # with 1ms second wait: no problem
        wait(0.75)  # this wait is not enough to prevent the error


main2()
@BertLindeman BertLindeman added the triage Issues that have not been triaged yet label Jun 6, 2024
@laurensvalk laurensvalk added bug Something isn't working software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime) and removed triage Issues that have not been triaged yet labels Jun 6, 2024
@laurensvalk
Copy link
Member

Good find! When not connected, it is supposed to ignore printed output but that clearly isn't happening.

@laurensvalk
Copy link
Member

Did you try this on any other hub?

@BertLindeman
Copy link
Author

Did you try this on any other hub?

Up to now only the Spike and Inventor. Will "do" others.

@laurensvalk
Copy link
Member

laurensvalk commented Jun 6, 2024

More minimal program and method below.

Run this on SPIKE Prime and press the disconnect button in Pybricks Code. No need to restart anything.

while True:
    print(f'hello - hello - hello - hello - hello - hello - hello')
    print(f'hello - hello - hello - hello - hello - hello - hello')

The hub doesn't quite crash all the way, but the animation still visibly stutters. Just for less than it needs for the watchdog timer to trigger.

@laurensvalk
Copy link
Member

Yet this is OK. So I guess we'll have to have a look at the format-string source code:

while True:
    print('hello - hello - hello - hello - hello - hello - hello')
    print('hello - hello - hello - hello - hello - hello - hello')

@laurensvalk laurensvalk changed the title [Bug] SpikePrime / RobotInventor hub power-off in a tight print loop during execution [Bug] using f'format strings in tight loop blocks and triggers watchdog timer Jun 6, 2024
@BertLindeman
Copy link
Author

This one stutters:

while True:
    print(f'{"hello":>10} -{"hello":>10} {"hello":>10} {"hello":>10} {"hello":>10}')
    print(f'{"hello":>10} -{"hello":>10} {"hello":>10} {"hello":>10} {"hello":>10}')

    print(f'{"hello":>10} -{"hello":>10} {"hello":>10} {"hello":>10} {"hello":>10}')
    print(f'{"hello":>10} -{"hello":>10} {"hello":>10} {"hello":>10} {"hello":>10}')

@laurensvalk
Copy link
Member

Yes, any long f'string' seems to be an issue. Normal strings seem fine.

@BertLindeman
Copy link
Author

BertLindeman commented Jun 6, 2024

Did you try this on any other hub?

How to "see" the stutter the other hubs?
I have the Move / City / 2 Technic hubs running with 8 of these print lines.
The inventor and prime hub stop automagically but the others run happily on.
Maybe the blue light is a bit more step by step. But now very clear.

@laurensvalk
Copy link
Member

laurensvalk commented Jun 6, 2024

Thanks for having a look! Since it seems to be specifically related to these format strings, I think it's OK to stick to just the primehub for now to keep it simple.

Thanks for bringing it to our attention. I haven't used these f-strings much in Pybricks for some reason.

@BertLindeman
Copy link
Author

Thanks for having a look! Since it seems to be specifically related to these format strings, I think it's OK to stick to just the primehub for now to keep it simple.

Thanks for bringing it to our attention. I haven't used these f-strings much in Pybricks for some reason.

I like the f-strings more and more. Just getting to know the formatting characters.

BTW this was the issue I found while testing the BT-on/off change in build 3380.

@BertLindeman
Copy link
Author

BertLindeman commented Jun 6, 2024

You know me, I keep fiddling 😄

Idea was does the f-string pose the problem or the print doing the formatting.

This snippet:

formatted1 = f'{"hello" * 3:>20} - {"wow"[:2]:>10} {"hello":>10} {"hello":>10} {"hello":>10}'
print(type(formatted1))
formatted2 = f'{formatted1[:10]:>20}'

while True:
    formatted2 = f'{formatted1[:10]:>20}'
    print(type(formatted1), formatted1)
    print(formatted2)
    print(type(formatted1), formatted1)

with an f-string in the loop makes the animation stutter.

The same program but the formatting commented out within the loop does not stutter:

formatted1 = f'{"hello" * 3:>20} - {"wow"[:2]:>10} {"hello":>10} {"hello":>10} {"hello":>10}'
print(type(formatted1))
formatted2 = f'{formatted1[:10]:>20}'

while True:
    # formatted2 = f'{formatted1[:10]:>20}'
    print(type(formatted1), formatted1)
    print(formatted2)
    print(type(formatted1), formatted1)

The watchdog is not triggered.

The 3 print commands are needed to see the stutter.

[EDIT]
Without the print statements in the loop, but with f-strings "that do formatting on slices" the running display animation is skipping frames, in my terms it stutters.
With SLICE = False the normal display animation.

The boolean SLICE determines which loop runs: f-strings with or without slices
# SLICE = False  # normal animation
SLICE = True  # animation skips frames; my term: it stutters

if SLICE:
    while True:  # loop with only f-string formatting and slices
        formatted1 = f'{"hello?"[:-1] * 3:>20} - {"wow"[:2]:>10} {"hello"[:4]:>10} {"hello":>10} {"hello":>10}'
        formatted2 = f'{formatted1[6:]:<20}'
else:
    while True:  # loop with only f-string formatting and NO slices
        formatted1 = f'{"hello" * 3:>20} - {"wow":>10} {"hello":>10} {"hello":>10} {"hello":>10}'
        formatted2 = f'{formatted1:<20}'
Movies:
SLICE_False_connected.mp4
SLICE_False_not_connected.mp4
SLICE_True_connected.mp4
SLICE_True_not_connected.mp4

@laurensvalk
Copy link
Member

laurensvalk commented Oct 1, 2024

Related: https://github.com/orgs/pybricks/discussions/1855 f-strings don't seem to work on REPL.

Update: This can be enabled with MICROPY_PY_FSTRINGS. It takes about 500 bytes on Technic Hub, which is also gradually approaching its maximum firmware size. So my inclination is to leave this disabled since it does work well in scripts (presumably mpy-cross is doing the parsing at compile time).

@laurensvalk
Copy link
Member

The crash / reboot has been fixed as linked above. Thanks for reporting!

@BertLindeman
Copy link
Author

The crash / reboot has been fixed as linked above. Thanks for reporting!

But the "stutter" is still there on build 3569.
Should I make another issue or just forget about is? (Fine by me)
Nothing crashes and it just works.
If the strings are printed all looks normal.

@laurensvalk
Copy link
Member

laurensvalk commented Oct 1, 2024

This one no longer stutters for me.

while True:
    print(f'hello - hello - hello - hello - hello - hello - hello')
    print(f'hello - hello - hello - hello - hello - hello - hello')

Can you share a program that still stutters? It's possible that there are other places where it still blocks.

@laurensvalk
Copy link
Member

laurensvalk commented Oct 1, 2024

I see that your program under the expander still stutters but in a different way. Unlike the others, this one also stutters while still connected, and it is not related to printing.

We can further reduce yours to just the following:

# This is the whole program, nothing else needed.
while True:
    a = f'{("hello?"[:-1] * 3)}'

And also without the f-string, even if this is probably equivalent as David indicated:

while True:
    a = "{0}".format("hello?"[:-1] * 3)

@laurensvalk laurensvalk reopened this Oct 1, 2024
@laurensvalk
Copy link
Member

And further simplified:

while True:
    a = "{}".format("h"[0] * 2)

@laurensvalk
Copy link
Member

Adding a single MICROPY_VM_HOOK_LOOP at the top of mp_obj_str_format_helper is enough to smoothen this out. I don't necessarily want to do that, but it means we can have a look at when we are normally running this and why it is so critical here. I think we were running it anyway every N instructions, including loop iterations, so maybe we can look at that.

@BertLindeman
Copy link
Author

BertLindeman commented Oct 1, 2024

You guys are fast.

In my words:
The last simplified version stutters if connected.
But if not PC connected (it does not stutter for ~ 2 seconds? and then stutters for a few seconds) and repeat.

This program:

while True:
    a = "{}".format("h"[0] * 2)

[EDIT] The slice in the format is the culprit, I think.

Further simplified it does not stutter:

while True:
    # a = "{}".format("h"[0] * 2)
    a = "{}".format("h" * 2)

@dlech
Copy link
Member

dlech commented Oct 1, 2024

But if not PC connected (it does not stutter for ~ 2 seconds? and then stutters for a few seconds) and repeat.

This sounds like a GC issue. str.format() has to allocate memory as does slicing, so no stutter until memory starts getting full, then stutter until memory is full, the GC is triggered and starts over again.

So maybe MICROPY_GC_HOOK_LOOP isn't getting called often enough in gc_alloc()?

Could test this by setting the GC threshold gc.threshold(5000) or something like that to prevent memory from getting full.

@BertLindeman
Copy link
Author

gc seems partly implemented:
help(gc):

object <module 'gc'> is of type module
  __name__ -- gc
  collect -- <function>
  disable -- <function>
  enable -- <function>
  isenabled -- <function>
  mem_free -- <function>
  mem_alloc -- <function>

@BertLindeman
Copy link
Author

BertLindeman commented Oct 1, 2024

A 'try / except' around the format shows a smooth display animation.

This one also goes smooth: (edit program was wrong! no slice was enabled)

import gc
help(gc)
gc.disable()
gc.collect()
print("PRE ", gc.mem_alloc(), " ", gc.mem_free())
while True:
    try:
        a = "{}".format("h"[0] * 2)
        # a = "{}".format("h" * 2)
    except:
        print("POST", gc.mem_alloc(), " ", gc.mem_free())
        gc.collect()

@dlech
Copy link
Member

dlech commented Oct 1, 2024

gc seems partly implemented:

I made pybricks/pybricks-micropython#267 to enable it for experimentation.

@BertLindeman
Copy link
Author

gc seems partly implemented:

I made pybricks/pybricks-micropython#267 to enable it for experimentation.

Will test.

@laurensvalk
Copy link
Member

laurensvalk commented Oct 1, 2024

Changing

#define MICROPY_GC_HOOK_LOOP(i) do { \
        if ((i & 0xf) == 0) { \
            MICROPY_VM_HOOK_LOOP \
        } \
} while (0)

From 0xf to 0x03 also keeps this particular test case running.

I suppose this exists to keep the calls down, but maybe we could use an actual counter to keep it every 0xf since not all loops may get to 0x0f?

@dlech
Copy link
Member

dlech commented Oct 1, 2024

since not all loops may get to 0x0f?

Technically, i & 0xf is the same as i % 0x10 so it has to get to 0x00 or 0x10, etc.

So MICROPY_GC_HOOK_LOOP(i - area->gc_last_free_atb_index); would ensure that it runs at least once for small numbers of loops.


Or maybe we need MICROPY_GC_HOOK_LOOP() in some of the outer for loops as well?

@BertLindeman
Copy link
Author

Not sure if this still helps, but here it is anyway 😄
Ran build 3570 with this program on a primehub.
The comments show the result:

import gc
print("PRE ", gc.mem_alloc(), " ", gc.mem_free())
print("pre enable", gc.threshold())
# flip-flopping stutter and normal animation up to here
# gc.threshold(500)    # smooth animation   prints 496
# gc.threshold(5000)   # smooth animation   prints 4992
# gc.threshold(25000)  # smooth animation   prints 24992
gc.threshold(27500)  # smooth animation   prints 27488
# gc.threshold(30000)  # slight stutter
# gc.threshold(40000)  # slight stutter
# gc.threshold(50000)  # slight stutter
# gc.threshold(75000)  # a bit more stutter
# gc.threshold(100000)  #  stutter
# gc.threshold(150000)  #  stutter and normal animation (huh?)
# gc.threshold(200000)  #  stutter and normal animation (huh?)
# gc.threshold(1000000)  #  stutter
gc.enable()
print("AFT enable", gc.threshold())

while True:
    a = "{}".format("h"[0] * 2)

@laurensvalk
Copy link
Member

laurensvalk commented Oct 2, 2024

since not all loops may get to 0x0f?

Technically, i & 0xf is the same as i % 0x10 so it has to get to 0x00 or 0x10, etc.

Oops, I misread -- yes, any multiple of 16 so including 0.

So MICROPY_GC_HOOK_LOOP(i - area->gc_last_free_atb_index); would ensure that it runs at least once for small numbers of loops.

This seems to apply to one place where it is called but not all?

Or maybe we need MICROPY_GC_HOOK_LOOP() in some of the outer for loops as well?

I suppose we could time the gc for the case where we always call MICROPY_VM_HOOK_LOOP (pbio_do_one_event). If there aren't any events, calling this shouldn't take long. pybricks/pybricks-micropython@85856ae says choosing 16 improved performance. Do you recall how significant it was?


EDIT: This test case also gets fixed by keeping the GC hook to run every 16 calls, but having one MICROPY_VM_HOOK_LOOP at the start of gc_alloc (outside the loop).

I see now this is what you probably meant above, to modify i - area->gc_last_free_atb_index only in GC_HOOK_LOOP in gc_alloc. This would run slightly more often than just once on entry, but I like this suggestion.

@laurensvalk
Copy link
Member

@BertLindeman
Copy link
Author

Nice, Laurens.

Ran all my issue1668 tests and all OK now.
The tests that used gc.threshold no longer run off course as that should not be normally available.

Ran on technic hubs (build 3656 as happend to be next to me and his build 3571) I see no difference between them, but also nothing out of the ordinary.

I like it.

Thank you both,

Bert

Anything else to do for this issue, like merge to master?
Otherwise close seems to be OK.

@laurensvalk
Copy link
Member

Possibly David's approval, he knows more about the garbage collector ♻️ .

@dlech
Copy link
Member

dlech commented Oct 2, 2024

says choosing 16 improved performance. Do you recall how significant it was?

Hmm... I should have put that in the commit message. Likely I used the pystones test for this.

I see now this is what you probably meant above, to modify i - area->gc_last_free_atb_index only in GC_HOOK_LOOP in gc_alloc.

Yes, that is the idea. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working software: pybricks-micropython Issues with Pybricks MicroPython firmware (or EV3 runtime)
Projects
None yet
Development

No branches or pull requests

3 participants