-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised software rasterizer / ImGui on Arduino #1613
Comments
draw (subsection of) font bitmap directly when drawing characters. |
Treat the texture as bitmap - if there's a white pixel, draw a pixel to your framebuffer, else do nothing. No need for alpha blending there. |
Yes it is also worth noting the atlas texture can be output as Alpha8, so 1 byte per pixel without color information. |
Interesting you should mention that, it is currently using Alpha8 but I had to immediately MemFree the pixel buffer it returns as it is way too big to fit in the ESP32s RAM, luckily I managed to store it as a constant at the top of the ino so it is read from flash rather than RAM. |
If you're low on ram, convert it to 1bit per pixel format, that should reduce it to 4kB at the cost of a few bitops more per texture access. |
Reading flash is probably still faster than doing bitops, but I'd need to actually test that to be sure |
This thread is beautiful, I have some ESP32 here in the office getting dust ;) |
On one small texture, they won't even notice :) IIRC I used single npot screen-sized texture per frame on Radeon 9600 some 10 years ago (for a video player) and generally there was tiny perf difference (if any at all, and it was certainly faster than stretching the image to power of two dimensions before sending it to gpu). |
First lot of optimizations more than halved the raster time (180ms -> 80ms). Roughly 11FPS excluding screen updates I also added support for 8bit, 16bit, 24bit and 32bit textures. Might be able to speed the raster time up further if you only use one type, but potentially at the cost of space (which the ESP32 doesn't have much of) |
It's a little curious how you are using SliderFloat to display times, instead of, say ImGui::Text("%f ms", time); |
Just removed rounding and added special cases for alpha blending (return if 0, don't blend if 255). Currently working on adding more |
Alright, I think this is about as good as I'm gonna get it |
That needs to be at minimum 10 times faster to be usable, let's make it happen :) You still have WindowRounding and borders visible in the video. The rounding will cause your window background to use large thin triangles instead of one rectangle. You'll probably double your speed for that given code just by disabling WindowRounding. Have you got anti-aliasing enabled? Between rounding and borders with AA just cost you double the amount of vertices in that shot. I'm not sure I understand why you have those 8/16/24/32 paths, especially for textures as you know your texture is 1bpp or 8bpp? You detect rectangle by comparing vertex contents whereas you could compare indices. The triangle rasterization could be done much faster, maybe look up at state of art triangle rasterization. Not sure why you go and do all those extraction of colors when it's not necessary for case where we don't blend? And you can switch to ProggyTiny (10 px) instead of ProggyClean (13 px) for that sorts of screen. I think I'm going to run a little bounty challenge for that tonight! It would be useful to have a good specialized software rasterizer available for imgui. Someone specialized in that sort of things (not me) could probably get us 100 times faster. I guess using much floating points on ESP32 isn't exactly desirable? EDIT Also added a link to my comment in the gallery thread: #1269 (comment) for people stumbling here. |
Alright, that points me in the right direction for more optimisations at least. Currently the font atlas is 8bit, the screen is 16bit and ImGui seems to work in 24/32, and I didn't see any performance impact by having them all supported by |
At pointed out by Per on twitter (I dumbly had overlooked the actual numbers) the raster cost is only a fifth of the cost, so while ultimately we can drive that down, it should probably be tacked along with the final blitting which is currently the slowest part. Where is the drawBitmap() function you are calling in UpdateScreen? If I look here there's no copy of drawBitmap() that matches your exact prototype The good news is that this TFT_22_ILI9225 code seemingly has immense of room for optimization. |
Check the PRs on that repo, my version is several times faster (4s vs 250ms) |
Thanks! OK so you'll already done a good job optimizing that part from the original version, that leaves us with less obvious perspectives. |
i had built a softrender for imgui too, but have no idea whether it will be faster on ESP32 (also, not interested in bounty, attribution is more than enough) - https://github.com/AlgoTradingHub/imgui_rt |
@wizzard0 I'll see if I can get it running on my ESP tomorrow afternoon |
Here is an idea, instead optimizing rasterizer, ImGui should support terminal rendering (prototype is here: https://github.com/jonvaldes/tear_imgui, video https://www.youtube.com/watch?v=OEGb4HrMkDo). This way you don't have to optimize generalized polygon rendering, rather, you focus only on terminal text rendering. |
Haven't started optimising yet, but I did add a screen clip. Worst case it's 4 if/pixel slower, best case it doesn't draw to the screen at all. Current test case is 2x faster: This version requires the testing branch of my fork of the TFT library https://github.com/LAK132/TFT_22_ILI9225/tree/testing |
None of the triangle render functions are working yet, but the new rectangle functions seems to be a heap faster (down to 2~3ms) EDIT: Current version no longer crashes on renderTri but it still isn't drawing correctly. Raster time is now at 13ms, a little over 10x faster than the first version |
The rewrite has been successful (as far as I can tell), it's well over 10x faster with WindowRounding disabled |
I made a software rasterizer for Dear ImGui which is NOT made for Arduino (it relies heavilty on floating point math), but it could maybe be a useful reference: https://github.com/emilk/imgui_software_renderer/blob/master/src/imgui_sw.cpp |
I'm close to breaking that 10x faster threshold with some more modifications to the TFT library My version now looks like this void TFT_22_ILI9225::_spiWrite16(uint16_t s)
{
#ifdef HSPI_WRITE16
if(_clk < 0){
HSPI_WRITE16(s);
return;
}
#endif
_spiWrite((uint8_t)(s >> 8));
_spiWrite((uint8_t)s);
}
void TFT_22_ILI9225::drawBitmap(uint16_t x1, uint16_t y1,
const uint16_t* bitmap, int16_t w, int16_t h) {
_setWindow(x1, y1, x1+w-1, y1+h-1,L2R_TopDown);
startWrite();
SPI_DC_HIGH();
SPI_CS_LOW();
#ifdef HSPI_WRITE_PIXELS
if (_clk < 0) {
HSPI_WRITE_PIXELS(bitmap, w * h * sizeof(uint16_t));
} else
#endif
for (uint16_t i = 0; i < h * w; ++i) {
_spiWrite16(bitmap[i]);
}
SPI_CS_HIGH();
endWrite();
} This is with the hardware SPI clocked at 20MHz. The ESP32 can handle 40MHz (and even 80MHz iirc), but the cables I'm using aren't good enough for that kind of speed. |
And there we have it, software rasteriser running the test code in under 10ms! Rasteriser is roughly 20x faster than in the original post, full loop roughly 10x faster! I have also moved some stuff around, softraster is now in the misc folder and there is an example impl for it: |
Great work! I’m excited to use this in future project. |
Nice! Would you mind update the wiki (root page and/or back-end page) with any useful applicable link? Thank you! |
Any thoughts on turning this into an ESP-IDF component? I hacked on it a little bit - Got something compiling, but crashing, and ran out of available time for the moment to really get into it. |
there's an IDF version (with Dual Shock 3 support!) here https://github.com/LAK132/IM-ESP32-PS3 |
Is there any interest in having the software rasteriser fork merged into this repo? |
I don't have have enough info to make that judgment right now (there are 2-3 rasterizers I don't know the pros and cons of each). |
The actual repository is https://github.com/LAK132/ImSoft, https://github.com/LAK132/IM-ESP32-PS3 and https://github.com/LAK132/ImDuino are just example projects |
My bad, thanks! |
Closing this as it doesn't seem to need to be open anymore! Thanks for sharing that fun project! |
Creating a new thread for this so I don't clutter the other sharing threads
Current performance figures:
Slider 1: Loop time excluding raster and TFT draw time (4ms)
Slider 2: TFT draw time (248ms)
Slider 3: Raster time (187ms)
The main loop without rasterizing and drawing to the screen is running at ~240Hz which is crazy fast for such a small device (160MHz ESP32), no need for any more performance gains here.
The TFT library is pushed right to its limit, only way to get it any faster will be to crank up the SPI speed or somehow work out how to only draw to the parts of the screen that have actually changed between refreshes (this would be handy for other devices with less RAM).
The rasterizer is still running a little slow, but as mentioned before it can be optimised with a heap of special cases, some of which have been tested but are currently disabled for debugging purposes.
Test if 2 triangles are actually a square
Test if triangle is a flat colour with basically no UV map, if it is don't bother interpolating values
Test if triangle is actually a line/single pixel
Faster lerp factor calculation for grid aligned triangles/rectangles
Remove rounding
Special cases for alpha blending
Less specific to the rasterizer but not rendering the window or background makes sense on a platform like this and should also give a performance boost
I'm going to continue to try and get as much performance as I can out of the software rasterizer for use both on Arduino and for general PC use, perhaps it will be a good base for an sr example or the regression testing system mentioned in an earlier thread.
If anyone has any working examples for optimisations that would certainly help speed me up
Code: https://github.com/LAK132/ImDuino
Only necessary modification to the base ImGui library was the removal of
#include <memory.h>
from stb_truetype.hThe text was updated successfully, but these errors were encountered: