• Runtime library
• Instruction generators with tiling support for large matrices
• Improved hardware with smaller resource cost
• Hardware-accelerated parallel-to-serial conversion
• Support for PYNQ on the Avnet Ultra96 (PYNQU96)
• Experimental support for cache coherency on (PYNQU96CC)