Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable 128K virtual memory via external SPI SRAM #6994

Merged
merged 15 commits into from
Mar 15, 2021

Conversation

earlephilhower
Copy link
Collaborator

@earlephilhower earlephilhower commented Jan 7, 2020

--edit--

This is at a state where it's working well and there is a reasonable API for users with a 23LC1024 SRAM wired up. The virtualmem example shows all the calls that are required.

Theory of Operation:

The Xtensa core generates a hardware exception (unrelated to C++ exceptions) when an address that's defined as invalid for load or store. The XTOS ROM routines capture the machine state and call a standard C exception handler routine (or the default one which resets the system).

We hook into this exception callback and decode the EXCVADDR (the address being accessed) and use the exception PC to read out the faulting instruction. We decode that instruction and simulate it's behavior (i.e. either loading or storing some data to a register/external memory) and then return to the calling application.

We use the hardware SPI interface to talk to an external SRAM/PSRAM, and implement a simple cache to minimize the amount of times we actually need to go out over the (slow) SPI bus. The SPI is set up in a DIO mode which uses no more pins than normal SPI, but provides for ~2X faster transfers.

NOTE: This works fine for processor accesses, but cannot be used by any of the peripherals' DMA. For that, we'd need a real MMU.

Hardware Configuration (make sure you have 3.3V compatible SRAMs):

  • SPI interfaced byte-addressible SRAM/PSRAM: 23LC1024 or smaller
    CS -> GPIO15
    SCK -> GPIO14
    MOSI -> GPIO13
    MISO -> GPIO12
    (note these are GPIO numbers, not the Arduion Dxx ones. Refer to your ESP8266 board schematic for the mapping of GPIO to pin.)
  • Higher density PSRAM (ESP-PSRAM64H/etc.) should work as well, but I'm still waiting on my chips so haven't done any testing. Biggest concern is their command set and functionality in DIO mode. If DIO mode isn't supported, then a fallback to SIO or moving to QIO is a possibility.

This is still a WIP, but the base handler is functional.
Using an exception handler that hooks into the invalid read/write address hardware exception, capture reads to 256MB starting at 0x1000_0000 and (eventually) map that into a SW managed cache in front of an external SPI SRAM.
This RAM will be slower than internal RAM since it is SW managed, but should accessible for use by all apps and the OS without any special concerns. Don't expect to use this RAM in ISRs or time critical sections.
The code now captures the read and write exceptions and continues operations, but the SPI SRAM read/write and the cache management is not in there yet. I need to dig out my SPI SRAM and hook it up to give it a try.
Based off of the PROGMEM misaligned exception handler.

@earlephilhower earlephilhower force-pushed the virtual-mem branch 4 times, most recently from cbea7f7 to 4a5a68f Compare January 15, 2020 16:52
@earlephilhower earlephilhower changed the title WIP - Enable virtual memory via external SPI SRAM Enable 128K virtual memory via external SPI SRAM Jan 18, 2020
@earlephilhower
Copy link
Collaborator Author

Example virtalmem.ino output. 126K free is nice. :)

Internal buffer: Address 0x3ffef5f4, free 47624
External buffer: Address 0x1000000c, free 126960
Virtual Memory Write:   453518 cycles for 4K
Physical Memory Write:  7170 cycles for 4K
Virtual Memory Read:   415573 cycles for 4K (sum 0aaaaa00)
Physical Memory Read:  7176 cycles for 4K (sum 0aaaaa00)
Virtual Memory Write:   355935 cycles for 2K by 16
Physical Memory Write:  7170 cycles for 2K by 16
Virtual Memory Read:   335255 cycles for 2K by 16 (sum 01baaa00)
Physical Memory Read:  7176 cycles for 2K by 16 (sum 01baaa00)
Virtual Memory Write:   294015 cycles for 1K by 8
Physical Memory Write:  7170 cycles for 1K by 8
Virtual Memory Read:   287921 cycles for 1K by 8 (sum 0001fe00)
Physical Memory Read:  7171 cycles for 1K by 8 (sum 0001fe00)
Internal free: 47624
External free: 126648
String: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
Internal free: 51728
External free: 130752

@earlephilhower earlephilhower force-pushed the virtual-mem branch 2 times, most recently from 371d925 to fe27885 Compare January 19, 2020 21:38
Provides a transparently accessible additional block of RAM of 128K to
8MB by using an external SPI SRAM.  This memory is managed using the UMM
memory manager and can be used by the core as if it were internal RAM
(albiet much slower to read or write).

The use case would be for things which are quite large but not
particularly frequently used or compute intensive.  For example, the SSL
buffers of 16K++ are a good fit for this, as are the contents of Strings
(both to avoid main heap fragmentation as well as allowing Strings of
>30KB).

A fully associative LRU cache is used to limit the SPI bus bottleneck,
and background writeback is supported.

Requires `ESP.enableVM()` call to actually add the VM subsystem.  If
this routine is not called, then the entire VM routines should not be
linked in to user apps, so there should be no space penalty w/o it.

UMM `malloc` and `new` are modified to support internal and external
heap regions.  By default, everything comes from the standard heap, but
a call to `ESP.setExternalHeap()` before the allocation (followed by a
call to `ESP.resetHeap()` will make the allocation come from external
RAM.  See the `virtualmem.ino` example for use.

If there is no external RAM installed, the `setExternalHeap` call is a
no-op.

The String and BearSSL libraries have been modified to use this external
RAM automatically.

Theory of Operation:

The Xtensa core generates a hardware exception (unrelated to C++
exceptions) when an address that's defined as invalid for load or store.
 The XTOS ROM routines capture the machine state and call a standard C
exception handler routine (or the default one which resets the system).

We hook into this exception callback and decode the EXCVADDR (the
address being accessed) and use the exception PC to read out the
faulting instruction. We decode that instruction and simulate it's
behavior (i.e. either loading or storing some data to a
register/external memory) and then return to the calling application.

We use the hardware SPI interface to talk to an external SRAM/PSRAM,
and implement a simple cache to minimize the amount of times we need
to go out over the (slow) SPI bus. The SPI is set up in a DIO mode
which uses no more pins than normal SPI, but provides for ~2X faster
transfers.  SIO mode is also supported.

NOTE: This works fine for processor accesses, but cannot be used by
any of the peripherals' DMA. For that, we'd need a real MMU.

Hardware Configuration (only use 3.3V compatible SRAMs!):

  SPI byte-addressible SRAM/PSRAM: 23LC1024 or smaller
    CS   -> GPIO15
    SCK  -> GPIO14
    MOSI -> GPIO13
    MISO -> GPIO12
 (note these are GPIO numbers, not the Arduino Dxx pin names.  Refer
  to your ESP8266 board schematic for the mapping of GPIO to pin.)

Higher density PSRAM (ESP-PSRAM64H/etc.) should work as well, but
I'm still waiting on my chips so haven't done any testing.  Biggest
concern is their command set and functionality in DIO mode.  If DIO
mode isn't supported, then a fallback to SIO is possible.

This PR originated with code from @pvvx's esp8266web server at
https://github.com/pvvx/esp8266web (licensed in the public domain)
but doesn't resemble it much any more.  Thanks, @pvvx!

Keep a list of the last 8 lines in RAM (~.5KB of RAM) and use that to
speed up things like memcpys and other operations where the source and
destination addresses are inside VM RAM.

A custom set of SPI routines is used in the VM system for speed and code
size (and because the core cannot be dependent on a library).
mhightower83 added a commit to mhightower83/Arduino that referenced this pull request Mar 2, 2020
Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp,
WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from
@earlephilhower PR esp8266#6994.

Reworked umm_malloc to use context pointers instead of copy context.
umm_malloc now supports allocations from IRAM. Added class
HeapSelectIram, ... to aid in selecting alternate heaps,
modeled after class InterruptLock.
Restrict alloc request from ISRs to DRAM.

Never ending improvements to debug printing.

Sec Heap option now pulls in free IRAM left over in the 1st 32K block.
Managed through umm_malloc with HeapSelectIram.

Updated examples.
earlephilhower added a commit to earlephilhower/ESP8266Audio that referenced this pull request Apr 13, 2020
Instead of requiring everyone to download and install another Arduino
library to use this lib, include an optimized, self-contained one from
the 8266 VM PR: esp8266/Arduino#6994

Does change the API as it now requires a HW chip select, so that
parameter is removed from the SpiRAM buffer object.
earlephilhower added a commit to earlephilhower/ESP8266Audio that referenced this pull request Apr 14, 2020
* Replace external SPIRAM lib w/internal optimized

Instead of requiring everyone to download and install another Arduino
library to use this lib, include an optimized, self-contained one from
the 8266 VM PR: esp8266/Arduino#6994

Does change the API as it now requires a HW chip select, so that
parameter is removed from the SpiRAM buffer object.

* Redo the FIFO to try and be faster

* Fix the SPI interface, buffering.

SPI FIFO needs special care so do all work in a RAM copy and the copy it
over manually w/32b writes.

Ensure the SPI command is done before touching the FIFO.

Fix buffering organization.

Make the buffer report status as a #### bar.

* Fix infinite wait on EOF

* Add SW chip select support

* Update URL to one that's live today

* Clean up metadata printout

* Clean up readme refs to obsolete lib

* Slow down the DIO->SIO reset sequence

* Update readme to remove old SPIRam library ref

Also add the resistor to the 1T circuit to avoid transistor overheating.
@s-hadinger
Copy link
Contributor

Earle, I have a question loosely coupled to this PR.

Would it be possible to use a similar trick to read data in Flash beyond the 1MB limit, without explicit SPI command, or even better, run code beyond the 1MB limit. I understand the performance hit, but there may be some scenarios for rarely used code or data.

@earlephilhower
Copy link
Collaborator Author

This method won't work for code in PMEM, but it's possible to use this idea to do data reads from outside of PMEM. This only catches load or store exceptions, not ifetch ones. I don't think ifetch exceptions are exposed.

d-a-v pushed a commit that referenced this pull request Dec 6, 2020
* PoC cache configuration control

Expaned boards.txt.py to allow new MMU options and create revised .ld's
Updated eboot to pass 48K IRAM segments.
Added Cache_Read_Enable intercept to modify call for 16K ICACHE
Update platform.txt to pass new mmu options through to compiler and linker preprocessor.
Added quick example: esp8266/MMU48K

* Style corrections
Added MMU_ qualifier to new defines.
Moved changes into their own file.
Don't know how to fix platformio issue.

* Added detailed description for Cache_Read_Enable.
Updated tools/sizes.py to report correct IRAM size and indicate ICACHE size.
Merged in earlephilhower's work on unaligned exception. Refactored and added
support for store operations and changed the name to be more closely aligned
with its function. Improved crash reporting path.

* Style and MMU_SEC_HEAP corrections.

* Improved asm register usage.
Added some inline functions to aid in byte and short access to iRAM.
 * only byte read has been tested
Updated .ld file to work better with platform.io; however, I am still
missing some steps, so platformio will still fail.

* Interesting glitch in boards.txt after github merge. A new board in
master was missing new additions added by boards.txt.py in the PR.
Which the CI flags when it rebuilds boards.txt.

* Support for 2nd Heap, excess IRAM, through umm_malloc.

Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp,
WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from
@earlephilhower PR #6994.

Reworked umm_malloc to use context pointers instead of copy context.
umm_malloc now supports allocations from IRAM. Added class
HeapSelectIram, ... to aid in selecting alternate heaps,
modeled after class InterruptLock.
Restrict alloc request from ISRs to DRAM.

Never ending improvements to debug printing.

Sec Heap option now pulls in free IRAM left over in the 1st 32K block.
Managed through umm_malloc with HeapSelectIram.

Updated examples.

* Post push CI cleanup.

* Cleanup part II

* Cleanup part III

* Updates to support platformio, maybe.

* Added exception C wrapper replacement.

* CI Cleanup

* CI Cleanup II

Don't know what to do with platformio it doesn't like my .S file.
ifdef out USE_ISR_SAFE_EXC_WRAPPER to block the new assemlby module
from building on platformio only.

* Changes to exc-c-wrapper-handler.S to assemble under platformio.

* For platformio, Correction to toolchain-xtensa include path.
@mcspr, Thankyou!

* Temporarily added --print-memory-usage to ld parameters for cross-checking IRAM size.

* undo change to platform.txt

* correct merge conflict. take 1

* Fixed #if... for building umm_get_oom_count. It was not building when UMM_STATS_FULL was used.

* Commented out XMC support. Compatibility issues with PoC when using 16K ICACHE.

* Corrected size.py, DRAM bracketing changed to not include ICACHE with DRAM total.

* Added additional _context for support of use of UMM_INLINE_METRICS.
Corrected some UMM_POSION missed edits.

* Changes to clear errors and warnings from toolchain 10.1

Several fixes and improvements to example MMU48K.

With the improved optimization in toolchain 10.1 The example divide by 0
exception was failing with a HWDT event instead of its exception handler.
The compiler saw the obscured divide by 0 and replaced it with a break point.

* Isolated incompatable definitions related to _xtos_set_exception_handler.
GDBSTUB definitions are different from the BootROM's.

* Update tools/platformio-build.py

Co-authored-by: Max Prokhorov <[email protected]>

* Requested changes

Changed mmu related usages of ETS_... defines to DBG_MMU_...

Cleanup in example MMU48K.ino. Removed stale memory reference macro
and mmu_status print statement. Cleanup printf '\n' to be '\r\n'.

Improved issolation of development debug prints from the rest of the debug prints.

* Corrected comment. And added missing include.

* Improve comment.

* style and comment correction

* Added draft mmu.rst file and updated index.
Updated example HeapMetric.ino to also illustrate use of IRAM
Improved comments in exc-c-wrapper-handler.S. Added insurance IRQ disable.

* Updated mmu.rst

Improved function name uniqueness for is_iram, is_dram, and is_icache by
adding prefix mmu_. Also, made them available outside of a debug build.
Made pointer precision width more specific.

Made some of the static inline functions in mmu_irm.h safe for ISRs by
setting then for always inline.

* Add a default MMU_IRAM_SIZE value for a new CI test to pass.

Extended use 'umm_heap_context_t *_context' argument in ..._core functions
and expanded its usage to reduce unnecessary repeated calls to
umm_info(NULL, false), also removed recursion from umm_info(NULL, true).

Fixed stack buffer length in umm_info_safe_printf_P and heap.cpp.

Added example for creating an IRAM reserve section.

Updated mmu.rst. Grammar and spelling corrections.

* CI appeasement

* CI appeasement with comment correction.

* Ensure SYS always runs with DRAM Heap selected.

* Add/move heap stack overflow/underflow check to Esp.cpp where the event was discarded.

* Improved comment clarity of purpose for IramReserve.ino. Clean up MMU48K.ino

* Added missing #include

* Corrected usage of warning

* CI appeasement and use #message not #pragma message

* Updated git version of eboot.elf to match build version.
Good test catch.

* Remove conditional build option USE_ISR_SAFE_EXC_WRAPPER, always install.

Use the replacement wrapper on non32xfer_exception_handler install.

Added comments to code describing some exception handling issues.

* Updated mmu.rst

* Expanded and clarified comments.

Limited access to some detailed typdefs/prototypes to .cpp
modules, to avoid future build conflicts.

Completed TODO for verifing that the "C" structure struct __exception_frame
matches the ASM version.

Fixed some typo's, code rot, and added some more cases in examaple irammem.ino.
Refactored a little and reordered printing to ease comparison between methods.

Corrected `#ifdef __cplusplus` coverage area. Cleaned up `extern "C" ...` usage.
Fixes issues with including mmu_iram.h or esp8266_undocumented.h in .c files.

* Style fixes and more cleanup

* Style fix

* Remove unnessasary IRAM_ATTR from install_non32xfer_exception_handler

Some comment tuning.

In the context of _xtos_set_exception_handler and the functions it registers,
changed to type int for exception cause type. This is also the type used by gdbstub
and some other Xtensa files I found.
Use same ASM macro from @mhightower83's non32xfer handler in the VM
handler, avoid potential GCC optimization issues.
@earlephilhower
Copy link
Collaborator Author

Thanks for the reminder, @mhightower83 ! I've pulled out the common ASM block so your debugged one will be used in the VM subsystem as well as the non32xfer one.

Because UMM manages RAM in 8 byte chunks, attempting to manage the
entire 1M available space on a 1M PSRAM causes the block IDs to
overflow, crashing things at some point.  Limit the UMM allocation to
only 256K in this case.  The remaining space can manually be assigned to
buffers/etc. managed by the application, not malloc()/free().
@earlephilhower earlephilhower added this to the 3.0.0 milestone Feb 16, 2021
Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nothing jumps out.

@earlephilhower earlephilhower merged commit 8ffe41b into esp8266:master Mar 15, 2021
@earlephilhower earlephilhower deleted the virtual-mem branch March 15, 2021 01:44
@dok-net
Copy link
Contributor

dok-net commented Mar 15, 2021

@earlephilhower There are ICACHE_RAM_ATTRs in this PR.

@david-vfortified
Copy link

Hello,
Is it possible to store global variables in external SPI RAM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants