-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable 128K virtual memory via external SPI SRAM #6994
Conversation
cbea7f7
to
4a5a68f
Compare
Example
|
371d925
to
fe27885
Compare
Provides a transparently accessible additional block of RAM of 128K to 8MB by using an external SPI SRAM. This memory is managed using the UMM memory manager and can be used by the core as if it were internal RAM (albiet much slower to read or write). The use case would be for things which are quite large but not particularly frequently used or compute intensive. For example, the SSL buffers of 16K++ are a good fit for this, as are the contents of Strings (both to avoid main heap fragmentation as well as allowing Strings of >30KB). A fully associative LRU cache is used to limit the SPI bus bottleneck, and background writeback is supported. Requires `ESP.enableVM()` call to actually add the VM subsystem. If this routine is not called, then the entire VM routines should not be linked in to user apps, so there should be no space penalty w/o it. UMM `malloc` and `new` are modified to support internal and external heap regions. By default, everything comes from the standard heap, but a call to `ESP.setExternalHeap()` before the allocation (followed by a call to `ESP.resetHeap()` will make the allocation come from external RAM. See the `virtualmem.ino` example for use. If there is no external RAM installed, the `setExternalHeap` call is a no-op. The String and BearSSL libraries have been modified to use this external RAM automatically. Theory of Operation: The Xtensa core generates a hardware exception (unrelated to C++ exceptions) when an address that's defined as invalid for load or store. The XTOS ROM routines capture the machine state and call a standard C exception handler routine (or the default one which resets the system). We hook into this exception callback and decode the EXCVADDR (the address being accessed) and use the exception PC to read out the faulting instruction. We decode that instruction and simulate it's behavior (i.e. either loading or storing some data to a register/external memory) and then return to the calling application. We use the hardware SPI interface to talk to an external SRAM/PSRAM, and implement a simple cache to minimize the amount of times we need to go out over the (slow) SPI bus. The SPI is set up in a DIO mode which uses no more pins than normal SPI, but provides for ~2X faster transfers. SIO mode is also supported. NOTE: This works fine for processor accesses, but cannot be used by any of the peripherals' DMA. For that, we'd need a real MMU. Hardware Configuration (only use 3.3V compatible SRAMs!): SPI byte-addressible SRAM/PSRAM: 23LC1024 or smaller CS -> GPIO15 SCK -> GPIO14 MOSI -> GPIO13 MISO -> GPIO12 (note these are GPIO numbers, not the Arduino Dxx pin names. Refer to your ESP8266 board schematic for the mapping of GPIO to pin.) Higher density PSRAM (ESP-PSRAM64H/etc.) should work as well, but I'm still waiting on my chips so haven't done any testing. Biggest concern is their command set and functionality in DIO mode. If DIO mode isn't supported, then a fallback to SIO is possible. This PR originated with code from @pvvx's esp8266web server at https://github.com/pvvx/esp8266web (licensed in the public domain) but doesn't resemble it much any more. Thanks, @pvvx! Keep a list of the last 8 lines in RAM (~.5KB of RAM) and use that to speed up things like memcpys and other operations where the source and destination addresses are inside VM RAM. A custom set of SPI routines is used in the VM system for speed and code size (and because the core cannot be dependent on a library).
fe27885
to
8788c76
Compare
Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp, WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from @earlephilhower PR esp8266#6994. Reworked umm_malloc to use context pointers instead of copy context. umm_malloc now supports allocations from IRAM. Added class HeapSelectIram, ... to aid in selecting alternate heaps, modeled after class InterruptLock. Restrict alloc request from ISRs to DRAM. Never ending improvements to debug printing. Sec Heap option now pulls in free IRAM left over in the 1st 32K block. Managed through umm_malloc with HeapSelectIram. Updated examples.
Instead of requiring everyone to download and install another Arduino library to use this lib, include an optimized, self-contained one from the 8266 VM PR: esp8266/Arduino#6994 Does change the API as it now requires a HW chip select, so that parameter is removed from the SpiRAM buffer object.
* Replace external SPIRAM lib w/internal optimized Instead of requiring everyone to download and install another Arduino library to use this lib, include an optimized, self-contained one from the 8266 VM PR: esp8266/Arduino#6994 Does change the API as it now requires a HW chip select, so that parameter is removed from the SpiRAM buffer object. * Redo the FIFO to try and be faster * Fix the SPI interface, buffering. SPI FIFO needs special care so do all work in a RAM copy and the copy it over manually w/32b writes. Ensure the SPI command is done before touching the FIFO. Fix buffering organization. Make the buffer report status as a #### bar. * Fix infinite wait on EOF * Add SW chip select support * Update URL to one that's live today * Clean up metadata printout * Clean up readme refs to obsolete lib * Slow down the DIO->SIO reset sequence * Update readme to remove old SPIRam library ref Also add the resistor to the 1T circuit to avoid transistor overheating.
Earle, I have a question loosely coupled to this PR. Would it be possible to use a similar trick to read data in Flash beyond the 1MB limit, without explicit SPI command, or even better, run code beyond the 1MB limit. I understand the performance hit, but there may be some scenarios for rarely used code or data. |
This method won't work for code in PMEM, but it's possible to use this idea to do data reads from outside of PMEM. This only catches load or store exceptions, not ifetch ones. I don't think ifetch exceptions are exposed. |
8d65fe9
to
db8a728
Compare
* PoC cache configuration control Expaned boards.txt.py to allow new MMU options and create revised .ld's Updated eboot to pass 48K IRAM segments. Added Cache_Read_Enable intercept to modify call for 16K ICACHE Update platform.txt to pass new mmu options through to compiler and linker preprocessor. Added quick example: esp8266/MMU48K * Style corrections Added MMU_ qualifier to new defines. Moved changes into their own file. Don't know how to fix platformio issue. * Added detailed description for Cache_Read_Enable. Updated tools/sizes.py to report correct IRAM size and indicate ICACHE size. Merged in earlephilhower's work on unaligned exception. Refactored and added support for store operations and changed the name to be more closely aligned with its function. Improved crash reporting path. * Style and MMU_SEC_HEAP corrections. * Improved asm register usage. Added some inline functions to aid in byte and short access to iRAM. * only byte read has been tested Updated .ld file to work better with platform.io; however, I am still missing some steps, so platformio will still fail. * Interesting glitch in boards.txt after github merge. A new board in master was missing new additions added by boards.txt.py in the PR. Which the CI flags when it rebuilds boards.txt. * Support for 2nd Heap, excess IRAM, through umm_malloc. Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp, WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from @earlephilhower PR #6994. Reworked umm_malloc to use context pointers instead of copy context. umm_malloc now supports allocations from IRAM. Added class HeapSelectIram, ... to aid in selecting alternate heaps, modeled after class InterruptLock. Restrict alloc request from ISRs to DRAM. Never ending improvements to debug printing. Sec Heap option now pulls in free IRAM left over in the 1st 32K block. Managed through umm_malloc with HeapSelectIram. Updated examples. * Post push CI cleanup. * Cleanup part II * Cleanup part III * Updates to support platformio, maybe. * Added exception C wrapper replacement. * CI Cleanup * CI Cleanup II Don't know what to do with platformio it doesn't like my .S file. ifdef out USE_ISR_SAFE_EXC_WRAPPER to block the new assemlby module from building on platformio only. * Changes to exc-c-wrapper-handler.S to assemble under platformio. * For platformio, Correction to toolchain-xtensa include path. @mcspr, Thankyou! * Temporarily added --print-memory-usage to ld parameters for cross-checking IRAM size. * undo change to platform.txt * correct merge conflict. take 1 * Fixed #if... for building umm_get_oom_count. It was not building when UMM_STATS_FULL was used. * Commented out XMC support. Compatibility issues with PoC when using 16K ICACHE. * Corrected size.py, DRAM bracketing changed to not include ICACHE with DRAM total. * Added additional _context for support of use of UMM_INLINE_METRICS. Corrected some UMM_POSION missed edits. * Changes to clear errors and warnings from toolchain 10.1 Several fixes and improvements to example MMU48K. With the improved optimization in toolchain 10.1 The example divide by 0 exception was failing with a HWDT event instead of its exception handler. The compiler saw the obscured divide by 0 and replaced it with a break point. * Isolated incompatable definitions related to _xtos_set_exception_handler. GDBSTUB definitions are different from the BootROM's. * Update tools/platformio-build.py Co-authored-by: Max Prokhorov <[email protected]> * Requested changes Changed mmu related usages of ETS_... defines to DBG_MMU_... Cleanup in example MMU48K.ino. Removed stale memory reference macro and mmu_status print statement. Cleanup printf '\n' to be '\r\n'. Improved issolation of development debug prints from the rest of the debug prints. * Corrected comment. And added missing include. * Improve comment. * style and comment correction * Added draft mmu.rst file and updated index. Updated example HeapMetric.ino to also illustrate use of IRAM Improved comments in exc-c-wrapper-handler.S. Added insurance IRQ disable. * Updated mmu.rst Improved function name uniqueness for is_iram, is_dram, and is_icache by adding prefix mmu_. Also, made them available outside of a debug build. Made pointer precision width more specific. Made some of the static inline functions in mmu_irm.h safe for ISRs by setting then for always inline. * Add a default MMU_IRAM_SIZE value for a new CI test to pass. Extended use 'umm_heap_context_t *_context' argument in ..._core functions and expanded its usage to reduce unnecessary repeated calls to umm_info(NULL, false), also removed recursion from umm_info(NULL, true). Fixed stack buffer length in umm_info_safe_printf_P and heap.cpp. Added example for creating an IRAM reserve section. Updated mmu.rst. Grammar and spelling corrections. * CI appeasement * CI appeasement with comment correction. * Ensure SYS always runs with DRAM Heap selected. * Add/move heap stack overflow/underflow check to Esp.cpp where the event was discarded. * Improved comment clarity of purpose for IramReserve.ino. Clean up MMU48K.ino * Added missing #include * Corrected usage of warning * CI appeasement and use #message not #pragma message * Updated git version of eboot.elf to match build version. Good test catch. * Remove conditional build option USE_ISR_SAFE_EXC_WRAPPER, always install. Use the replacement wrapper on non32xfer_exception_handler install. Added comments to code describing some exception handling issues. * Updated mmu.rst * Expanded and clarified comments. Limited access to some detailed typdefs/prototypes to .cpp modules, to avoid future build conflicts. Completed TODO for verifing that the "C" structure struct __exception_frame matches the ASM version. Fixed some typo's, code rot, and added some more cases in examaple irammem.ino. Refactored a little and reordered printing to ease comparison between methods. Corrected `#ifdef __cplusplus` coverage area. Cleaned up `extern "C" ...` usage. Fixes issues with including mmu_iram.h or esp8266_undocumented.h in .c files. * Style fixes and more cleanup * Style fix * Remove unnessasary IRAM_ATTR from install_non32xfer_exception_handler Some comment tuning. In the context of _xtos_set_exception_handler and the functions it registers, changed to type int for exception cause type. This is also the type used by gdbstub and some other Xtensa files I found.
db8a728
to
f3927f1
Compare
Ran sanity tests successfully on 23LC1024 128K SRAM. Adjusted defines to ensure UMM knows about our heap.
Use same ASM macro from @mhightower83's non32xfer handler in the VM handler, avoid potential GCC optimization issues.
Thanks for the reminder, @mhightower83 ! I've pulled out the common ASM block so your debugged one will be used in the VM subsystem as well as the non32xfer one. |
Because UMM manages RAM in 8 byte chunks, attempting to manage the entire 1M available space on a 1M PSRAM causes the block IDs to overflow, crashing things at some point. Limit the UMM allocation to only 256K in this case. The remaining space can manually be assigned to buffers/etc. managed by the application, not malloc()/free().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nothing jumps out.
@earlephilhower There are |
Hello, |
--edit--
This is at a state where it's working well and there is a reasonable API for users with a 23LC1024 SRAM wired up. The
virtualmem
example shows all the calls that are required.Theory of Operation:
The Xtensa core generates a hardware exception (unrelated to C++ exceptions) when an address that's defined as invalid for load or store. The XTOS ROM routines capture the machine state and call a standard C exception handler routine (or the default one which resets the system).
We hook into this exception callback and decode the EXCVADDR (the address being accessed) and use the exception PC to read out the faulting instruction. We decode that instruction and simulate it's behavior (i.e. either loading or storing some data to a register/external memory) and then return to the calling application.
We use the hardware SPI interface to talk to an external SRAM/PSRAM, and implement a simple cache to minimize the amount of times we actually need to go out over the (slow) SPI bus. The SPI is set up in a DIO mode which uses no more pins than normal SPI, but provides for ~2X faster transfers.
NOTE: This works fine for processor accesses, but cannot be used by any of the peripherals' DMA. For that, we'd need a real MMU.
Hardware Configuration (make sure you have 3.3V compatible SRAMs):
CS -> GPIO15
SCK -> GPIO14
MOSI -> GPIO13
MISO -> GPIO12
(note these are GPIO numbers, not the Arduion Dxx ones. Refer to your ESP8266 board schematic for the mapping of GPIO to pin.)
This is still a WIP, but the base handler is functional.Using an exception handler that hooks into the invalid read/write address hardware exception, capture reads to 256MB starting at 0x1000_0000 and (eventually) map that into a SW managed cache in front of an external SPI SRAM.This RAM will be slower than internal RAM since it is SW managed, but should accessible for use by all apps and the OS without any special concerns. Don't expect to use this RAM in ISRs or time critical sections.The code now captures the read and write exceptions and continues operations, but the SPI SRAM read/write and the cache management is not in there yet. I need to dig out my SPI SRAM and hook it up to give it a try.Based off of the PROGMEM misaligned exception handler.