Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic underflow in SPI read routines #337

Closed
michaelstoops opened this issue Apr 13, 2021 · 13 comments · Fixed by #338
Closed

Arithmetic underflow in SPI read routines #337

michaelstoops opened this issue Apr 13, 2021 · 13 comments · Fixed by #338
Assignees

Comments

@michaelstoops
Copy link
Contributor

I've identified an arithmetic underflow in the SPI read routines which seems to lead a deadlock, at least in my application.

The repeatability code is raspberrypi/pico-examples#101, in the case that the SPI slave starts while the SPI master is in the middle of a transmission. The slave correctly transmits 8 bytes and then deadlocks.

The flaw is in

if (tx_remaining && spi_is_writable(spi) && rx_remaining - tx_remaining < fifo_depth) {
and similar lines below. In the error condition, tx_remaining is 248 while rx_remaining is 247. The operation underflows and gives a result of 0xFF FF FF FF FF FF FF FF.

I'm not 100% clear on what happens next, and it probably depends on the timing of signals coming in anyway. Either way, I think the arithmetic underflow is not intended.

@kilograham
Copy link
Contributor

presumably you mean hang not deadlock

@michaelstoops
Copy link
Contributor Author

presumably you mean hang not deadlock

Fair. :)

@michaelstoops
Copy link
Contributor Author

Submitted proposed fix at #338.

@michaelstoops michaelstoops changed the title Arithmetic underflow in SPI read routines, leading to deadlock Arithmetic underflow in SPI read routines Apr 13, 2021
Wren6991 pushed a commit that referenced this issue Apr 13, 2021
@Wren6991
Copy link
Contributor

I have merged #338, since the unsigned wrapping in the comparison was not intentional!

AFAIK it is an invariant for master mode that rx_remaining >= tx_remaining at any point in time, and it seems like slave mode breaks this invariant in some or all cases.

in the case that the SPI slave starts while the SPI master is in the middle of a transmission.

Could you expand on that please? Is the timing of the start of the slave transfer vs the start of the master transfer significant?

@michaelstoops
Copy link
Contributor Author

michaelstoops commented Apr 13, 2021

Yes. Are you able to read Saleae Logic data files?

@michaelstoops
Copy link
Contributor Author

Yes. Are you able to read Saleae Logic data files?

I guess you can download the application here, whether or not you have a Saleae device. https://www.saleae.com/downloads/

@kilograham
Copy link
Contributor

i don't have enough context, but it would seem sensible that a slave might receive before transmitting? However whatever the outcome in that case, I worry about an off by one error on the other end of things (i.e. buffer being full)

@michaelstoops
Copy link
Contributor Author

Here is the happy path. Slave starts first and waits for clock pulses from the master.
Screen Shot 2021-04-13 at 3 25 50 PM

Happy_path.logicdata.zip

Here is the error path: master starts first, and slave starts while master is transmitting. The slave transmits only 8 bytes and stops.

Screen Shot 2021-04-13 at 3 23 59 PM

master_first_slave_second.logicdata.zip

@Wren6991
Copy link
Contributor

Yes. Are you able to read Saleae Logic data files?

I do have an old version of Logic installed yes. The trace attached to the PR does not seem to show this lockup case.

I'm not sure exactly what the circumstances are for rx_remaining becoming less than tx_remaining, but almost certainly to do with the fact that the push to the RX FIFO precedes the pop from the TX FIFO when in slave mode, whilst the reverse is true for master mode. Also noting that this check is likely not useful at all in slave mode, since you are not able to control the flow into the RX shift register at all, and if you get interrupted too heavily then the RX FIFO will simply overflow due to master TX being too fast.

@Wren6991
Copy link
Contributor

So there is an underlying logic error in the SPI code, and the arithmetic unsigned wrapping issue was just a symptom of that. The FIFO depth clause should be disabled in slave mode.

@raspberrypi raspberrypi deleted a comment from lurch Apr 14, 2021
@kilograham
Copy link
Contributor

can this issue be closed?

@michaelstoops
Copy link
Contributor Author

I opened this issue with the point to discuss the bug which was addressed through #338. I'm fine if you want to close this.

If there's a reason to keep this open, it's this:

i don't have enough context, but it would seem sensible that a slave might receive before transmitting? However whatever the outcome in that case, I worry about an off by one error on the other end of things (i.e. buffer being full)

On one hand, it seems like an audit is in order. On the other hand, I don't plan on doing it. Maybe nobody else does either, so it's up to you and your process.

Cheers,
Michael

@kilograham
Copy link
Contributor

yeah, that was just me pontificating; if it works for you, i will close.

kilograham added a commit that referenced this issue Jun 3, 2021
See release notes for more descriptive details

* Delete some redundant CMake parts (#240)

* pio: Add 'pragma once' to generated header files (#237)

* pio: allow programs with 32 instructions (#236)

* Fixup incorrect doxygen for multicore_fifo_wready

* Add param-validation to spin_lock_instance

* Fix back-to-front description of IRQ priority in doxygen (#245)

* Fix ROSC typo (#259)

* Typo (#251)

* Add gpio_get_out_level() accessor, and correct SIO GPIO_OUT struct ty… (#247)

* Add gpio_get_out_level() accessor, and correct SIO GPIO_OUT struct type from WO to RW

* Add pico_get_unique_board_id_string API (#281)

* Clean up -Wconversion=error issues

* move PLL reset code from clocks driver to pll driver (#110)

* Don't clear PLL PWR/FBDIV after reset as unnecessary. Call out in runtime.c why USB/syscfg aren't reset.

* i2c: set hold time of SDA during transmit to an appropriate value (#273)

* i2c: set hold time of SDA during transmit to 2 for TCS34725 color sensor

* i2c: fix issues in i2c_write_blocking_internal

* i2c: rename sda_hold_count to sda_tx_hold_count

* use assert rather than invalid_params_if for internal consistency checks

* i2c: use a more appropriate sda tx hold time at higher baudrates

* i2c: reduce 120/1e9 to the smallest possible integer numerator and denominator

* Update NULL GPIO function to 0x1f (#320)

* i2c: set high and low times to values that conform to the i2c specification (#314)

* Make flash_do_cmd public (#269)

* Fix implementation config listing in structs/i2c.h (#324)

* Clarify that cache is flushed, but that function is intended for low-level metadata access during startup (#322)

* Fix implementation config listing in structs/i2c.h (#325)

* Fix param-validation for PIO sideset encoding (#311)

* Remove MASTER_ON_HOLD bit from I2C status registers. Fix typos. (#326)

* Fixing arithmetic underflow in SPI I/O loops #337 (#338)

* Source code licence clarification (#340)

* Updated existing Pimoroni board headers to match latest style, and added a new board (#343)

* Added new pimoroni board headers

* SPI Definitions for SparkFun boards (#344)

* SPI Definitions for SparkFun boards


* Clarify multicore_fifo doxygen (#323)

Based on my observations in #284

* correct adafruit flash size for itsybitsy and qt rp2040 (#348)

from 4 MB to 8 MB

* Small typos (#366)

* make spi_init return baud rate set (#296)

* Fix path + typo in README.md (#347)

* Fix path + typo in README.md

* Remove incorrect path change

* Remove typo

* disable core 0 SIO FIFO IRQ handler during core 1 launch in case someone has already installed one (#375)

* add PICO_DIVIDER_DISABLE_INTERRUPTS flag which makes PICO_DIVIDER disable interrupts around division rather than using co-operative guards to protect nested use (i.e. within IRQ/exception). Use of this flag can simplify porting of RTOSes but with a different performance profile (#372)

* make all non hardware_ libraries foo add C preprocessor definition LIB_FOO=1, and remove bespoke definitions which were all undocumented anyway (#374)

* Change various (confusing to user) message to be DEBUG only (#365)

* add small delay to stdio_get_until to prevent starvation of USB IRQ handler due to in use mutex. build was non deterministic due to missing link wrapping of getchar (#364)

* Some cmake build improvements (#376)

* Change some cmake output to DEBUG level
Make SDK build more consistent with other libraries (use an INTERFACE marker library for inclusion tests)
Add PICO_SDK_PRE_LIST_FILES, PICO_SDK_POST_LIST_FILES build vars

* fix typo

* remove leftover debugging message

* i2c: improve communication with i2c devices in i2c_write_blocking

* Definitions for IC_TX_BUFFER_DEPTH inconsistent (fixes #335) (#381)

* Add hardware_exception for setting exception handlers at runtime (#380)

* add __always_inline to trivial super low level inline functions (#379)

* Rework lock_core / timers (#378)

* remove spurious sys/select.h include (#377)

* Fixup IRQ_PRIORITY #define values (#393)

* Correct doxygen for mutex_try_enter (#392)

* Fix a bunch of doxygen typos (#391)

* Rework ordering of cmake, so that libraries in subdirectories can add to internal lists as PICO_SDK_POST_LIST_FILES, PICO_CONFIG_HEADER_FILES etc. (#382)

* Fix some hardware_library dependencies (#383)

* make host pico_platform.h and binary_info.h CMakeLists.txt safe for inclusion in non SDK build (#388)

* Add basic CMSIS core headers (#384)

* Fix the PICO_CONFIG default value for PICO_CMSIS_RENAME_EXCEPTIONS (#399)

* add timeout_us/until to mutex/sem blocking methods (#402)

* Fixup divider save_restore for floating point too; improve tests (#405)

* fix pico_promote_common_scope_vars (#397)

* add comment about using clk_gpout0 enable bit (Fixes #413)

* pioasm: prevent double inclusion for C SDK generated headers (#417)

* Add missing cast to uint32_t in hw_divider_u32_quotient for host (#436)

* Optional feature to get the max level that has ever been held by a queue (#444)

* Fix wrong format string in alarm_pool_dump_key (#437)

* Add support for Arduino Nano RP2040 Connect (#425)

* Add support for Arduino Nano RP2040 Connect

* Add support for at25sf128a flash

* Fix function-name misspelling (#443)

* Update host multicore.h to match multicore.h in rp2_common (#439)

* Implement `uart_write_blocking` and `uart_read_blocking` for host (#438)

* Define `__STRING` for other compilers than MSVC in the host platform.h file (#434)

* Prevent warnings about some unused parameters in pico_stdio_usb when building with -Wextra (#431)

* Fix warnings about some unused parameters in pico_stdio_usb

* Use `__unused` for the unused parameter in tud_descriptor_configuration_cb

* Remove redundant inclusions of `pico/platform.h`

* Define `void operator delete[](void *p, std::size_t n)` in new_delete.cpp (#430)

* queue: make data pointers const in queue_try_add and queue_add_blocking (#423)

* misc interp_ fixes (#428)

* some typo fixes (#408)

* Prevent the literal string DEBUG from being appended to some messages in CMake < 3.15 (#433)

* Add function to get the currently selected channel (#451)

* Add missing board detection macros (#448)

* add board detection macros for Sparkfun & RPi Pico / VGA Board

* dma_channel_transfer_[from/to]_buffer_now: added const volatile to read_addr and volatile to write_addr (#449)

* Change the quick-start instructions to include installation of the (#92)

* added spi_get_baudrate() + some consistency changes (#395)

* Allow lengthening xosc startup delay with a compile option (#457)

* Add hardware_gpio accessors for Schmitt, slew rate, drive strength (fixes #290) (#464)

* Add some spin lock related doxygen

* Move to Tinyusb 0.10.0 (#462)

* Add usb device dpram to svd file. Fixes #351 (#465)

* Add PICO_PANIC_FUNCTION define to allow replacement of the default panic function (#463)

* Add missing DREQ_ definitions

* store actual clock frequency in clock_configure (fixes #368)

* Fix hw_is_claimed, and add xxx_is_claimed APIs

* Add some PIO irq helper methods

* Add DMA channel IRQ status getter and clear methods

* Implement the correct PIO IRQ status/clear methods (good to have methods here as the h/w interrupt registers are super confusing)

* fix pico_multicore dependencies

* add missing wrapper func __aeabi_f2d

* Further DMA/PIO IRQ API cleanup (and review fixes)

* add PICO_INT64_OPS_IN_RAM flag

* fix qtpy rp2040 uart rx rev B (#466)

* Move to tinyusb 0.10.1 (upstream tinyusb repo) ($467)

* Add gpio_set_irqover to match inover/outover/oeover (fixes #265) (#470)

Co-authored-by: Andrew Scheller <[email protected]>
Co-authored-by: Christian Flach <[email protected]>
Co-authored-by: Luke Wren <[email protected]>
Co-authored-by: Earle F. Philhower, III <[email protected]>
Co-authored-by: majbthrd <[email protected]>
Co-authored-by: Peter Lawrence <[email protected]>
Co-authored-by: Brian Cooke <[email protected]>
Co-authored-by: Scott Shawcroft <[email protected]>
Co-authored-by: Michael Stoops <[email protected]>
Co-authored-by: ZodiusInfuser <[email protected]>
Co-authored-by: Kirk Benell <[email protected]>
Co-authored-by: Ha Thach <[email protected]>
Co-authored-by: Exr0n <[email protected]>
Co-authored-by: Joni Kähärä <[email protected]>
Co-authored-by: Rafael G. Martins <[email protected]>
Co-authored-by: Jonathan Reichelt Gjertsen <[email protected]>
Co-authored-by: Martino Facchin <[email protected]>
Co-authored-by: Rene <[email protected]>
Co-authored-by: Brendan <[email protected]>
Co-authored-by: geurtv <[email protected]>
Co-authored-by: ewpa <[email protected]>
Co-authored-by: Dan Halbert <[email protected]>
Co-authored-by: Liam Fraser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants