Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BMP 4x loading speed improvement #31

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DJDevon3
Copy link

@DJDevon3 DJDevon3 commented Apr 29, 2024

Move BMP class from example to library. Because this display requires built in registers and is not compatible with displayio it needs its own graphics library. This display cannot be integrated with displayio without considerable effort above my skill level. Uses hardware acceleration native to the RA8875 chip for vector primitives and text (with fonts stored on the chip).

BMP loading is not hardware accelerated but I managed to make BMP's load about 4x faster with the help of ChatGPT.

Video demo showing previous bmptest (approximately 17 seconds) vs updated bmptest (approximately 4 seconds).

Adafruit RA8875 driver board with Adafruit bare 40-pin 7" Touch TFT display running on UM FeatherS3 (N16R8).

RA8875_BMP_Loading.mp4

Move class from example to library.
@RetiredWizard
Copy link

RetiredWizard commented Apr 29, 2024

I don't have the hardware but I was curious about the ChatGPT update so I looked over the code. The changes to the algorithm make sense to me. You're trading off memory usage for a more efficient file read method.

Do you think the display could be used on more memory constrained boards and/or attempt to display larger bitmap files that wouldn't fit into memory? If so, I was thinking adding the option to use the old line by line method might be of value. I though it would be fun to try and add the option without adding two complete code branches. I'd be curious if this works and if you think it would be useful 😁

BMP Class
class BMP:
    """
    Optimized with ChatGPT by DJDevon3
    https://chat.openai.com/share/57ee2bb5-33ba-4538-a4b7-ec3dea8ea5c7
    Draw Bitmap Helper Class (not hardware accelerated)
    :param str: filename BMP filename
    :param int colors: BMP color data
    :param int data: BMP data
    :param int data_size: BMP data size
    :param int bpp: BMP bit depth data
    :param int width: BMP width
    :param int height: BMP height
    :param int read_header: BMP read header function
    """

    class _BmpParse(object):
        def __init__(self,file_name,fast):
            self.file_name = file_name
            self.fast = fast
            
        def __enter__(self):
            if not self.fast:
                self.file = open(self.file_name,'rb')
                return self.file
            else:
                return None
                
        def __exit__(self, *args):
            if not self.fast:
                self.file.close()

    def __init__(self, filename):
        self.filename = filename
        self.colors = None
        self.data = None
        self.data_size = 0
        self.bpp = 0
        self.width = 0
        self.height = 0
        self.read_header()

    def read_header(self):
        """Read file header data"""
        if self.colors:
            return
        with open(self.filename, "rb") as bmp_file:
            bmp_file.seek(10)
            self.data = int.from_bytes(bmp_file.read(4), "little")
            bmp_file.seek(18)
            self.width = int.from_bytes(bmp_file.read(4), "little")
            self.height = int.from_bytes(bmp_file.read(4), "little")
            bmp_file.seek(28)
            self.bpp = int.from_bytes(bmp_file.read(2), "little")
            bmp_file.seek(34)
            self.data_size = int.from_bytes(bmp_file.read(4), "little")
            bmp_file.seek(46)
            self.colors = int.from_bytes(bmp_file.read(4), "little")

    def draw(self, disp, x=0, y=0, fast=True, debug=False):
        """Draw BMP"""
        if debug:
            print("{:d}x{:d} image".format(self.width, self.height))
            print("{:d}-bit encoding detected".format(self.bpp))

        line_size = self.width * (self.bpp // 8)
        if line_size % 4 != 0:
            line_size += 4 - line_size % 4

        if fast:
            with open(self.filename, "rb") as bmp_file:
                bmp_file.seek(self.data)
                pixel_data = bmp_file.read()

        with self._BmpParse(self.filename,fast) as bmp_file:
            if not fast:
                bmp_file.seek(self.data)
            disp.set_window(x, y, self.width, self.height)
            line_start = 0
            line_end = line_size
            for line in range(self.height):
                current_line_data = b""
                if fast:
                    line_start = line * line_size
                    line_end = line_start + line_size
                else:
                    pixel_data = bmp_file.read(line_size)

                for i in range(line_start, line_end, self.bpp // 8):
                    if (line_end - i) < self.bpp // 8:
                        break
                    if self.bpp == 16:
                        color = self.convert_555_to_565(
                            pixel_data[i] | pixel_data[i + 1] << 8
                        )
                    if self.bpp in (24, 32):
                        color = self.color565(
                            pixel_data[i + 2], pixel_data[i + 1], pixel_data[i]
                        )
                    current_line_data = current_line_data + struct.pack(">H", color)
                disp.setxy(x, self.height - line + y)
                disp.push_pixels(current_line_data)
            disp.set_window(0, 0, disp.width, disp.height)

    @staticmethod
    def convert_555_to_565(color_555):
        """Convert 16-bit color from 5-5-5 to 5-6-5 format"""
        r = (color_555 & 0x1F) << 3
        g = ((color_555 >> 5) & 0x1F) << 2
        b = ((color_555 >> 10) & 0x1F) << 3
        return (r << 11) | (g << 5) | b

    @staticmethod
    def color565(r, g, b):
        """Convert 24-bit RGB color to 16-bit color (5-6-5 format)"""
        return ((r & 0xF8) << 8) | ((g & 0xFC) << 3) | (b >> 3)

Edit: I was scanning the code and realized that the internal class _BmpParse was going to have a problem in "Fast" mode. I've edited the exit so I think it will work now.

@DJDevon3
Copy link
Author

DJDevon3 commented Apr 30, 2024

@RetiredWizard The original demo was designed for a Feather M4 so yes it can work. The M0 will struggle but it's doable. At the time this library was coded the M4 was probably top of the line with Circuit Python (other than Teensy). S2 and S3 didn't even exist yet.

Now that we have more powerful microcontrollers my thought is that it's time to revisit these larger displays. They are definitely usable on an S3. If I can find a way to combine the native RA8875 register methods with images and layers it's possible it could be faster than displays half its size due to the hardware acceleration.

The hardware acceleration is almost instantaneously updated for vector graphics or text using the native methods... easily 60fps. Under the right conditions, hooking into hardware acceleration, the S3 can drive vectors on this display faster than displayio can on a 128x64 OLED.

Melissa laid the ground work on this display driver but it hasn't been revisited in years. One of the things that made me decide to try it now is the port of the 7" Sunton ESP32-S3 board that uses a similar display which was recently added to the supported boards.

I like the changes you made and will likely do another commit to fold that in so that the M4 can still work with it. I didn't think about the line chunks and ram use, you're absolutely right and the old method should be left in. Thank you for doing that!

@RetiredWizard
Copy link

Thanks! I'm sort of curious if the private class I created to handle the with context manager is going to work as I expect. I did try and do some testing (although without the hardware I don't get far) but the testing did find that I left off the "self." in the with self._BmpParse(self.filename,fast) as bmp_file: line so I edited the code above again to fix that as well.

@DJDevon3
Copy link
Author

DJDevon3 commented May 1, 2024

After playing around with it for a bit figured out that it comes down to moving the function into the library and giving it a method for self.color565. Without the self it reaches outside to the top of the script for color conversion. I don't think there is a RAM difference though I have no idea how I'd test that.

The one line for color = color565 vs color = self.color565 makes all the difference in the world. I tried going back and making it work in the example with the embedded BMP class but it was throwing fits. This is better and I bet it will work better even on the M4 too. So this should be a real overall speed improvement regardless of the board though I haven't tested it on anything other than an S3.

Unfortunately the speed improvement only really works with the BMP test (rasterized). All of the native register methods with vector are already as fast as they can be.

@RetiredWizard I tried getting your code to work but couldn't. So I went the long way and added an argument to the function with an if/else to test the new method and old method. When adding self as described above the old method instantly becomes 4 times faster. There's almost no difference in using the for loop vs the with statement. It's the way it was reaching farther for color conversion that was keeping it slow. The way you wrote your context manager is pretty close to the way I was also testing it.

It's worth noting that the sheer amount of disabled pylint errors in this library is also what hid the issue from being discovered... specifically the ignore invalid names. They were ignored because there are too many variables named, x1, x2, x3, y1, y2, y3, etc... and it would be hours of work to rename them all properly... but it also ignored an actual error for color565 that would have lead to the optimization as a natural course of making pylint happy.

@DJDevon3
Copy link
Author

DJDevon3 commented May 1, 2024

No idea how that went unnoticed for so long. The blue and green RGB values were incorrect in the simpletest.

@RetiredWizard
Copy link

When adding self as described above the old method instantly becomes 4 times faster. There's almost no difference in using the for loop vs the with statement. It's the way it was reaching farther for color conversion that was keeping it slow.

Excellent sleuthing 😁 I'm surprised that there's apparently no speed advantage to reading the BMP file with a single read statement rather than one line at a time. I still think reading the entire BMP file into a variable could be a memory problem depending on the size of the image and/or the available resources on the board being used. But maybe that's a problem to be resolved when it's encountered.

I don't think there is a RAM difference though I have no idea how I'd test that.

Have you tried printing gc.mem_free() at the start and end of your test programs? I don't really understand the under the hood workings of CircuitPython memory but mem_free() is what I use to get a general feel for how much memory my programs are using.

@DJDevon3
Copy link
Author

DJDevon3 commented May 2, 2024

@RetiredWizard I did play around with attempting to feed it more bmp read lines in chunks and it was actually slower so there's no additional chunking improvements that can be done. The speed of the color conversion is the limiting factor. I have an idea to attempt to use the native vector pixel/line to read a bmp pixel by pixel and display it. No clue if that will work but it might be a way to tie into the native method for hardware accelerated vector to convert a rasterized image to vector... in theory.

The hardware accelerators are all vector based including the built-in fonts. They work about 100x faster so if I can fool it into building a rasterized image using its native vector methods... we'll see.

The startup time isn't as fast but the draw time is. Allows BMP class and other file types in the future to be expanded upon more easily.  Updated BMPtest example with the new imports and function calls.
@DJDevon3
Copy link
Author

DJDevon3 commented May 2, 2024

Since pylint doesn't like any file over 1000 lines we'll just move the entire BMP class to its own file and import that. Have confirmed it works as intended with the new changes. It also makes it easier to expand upon the class and add other class file types in the future like jpeg, gif, etc..

@DJDevon3
Copy link
Author

DJDevon3 commented May 2, 2024

Figured out how to read a single pixel and will push a PR for it shortly. I wrote a playground note on it here.

ability to read a pixel color from the display at x,y coordinates and return as RGB values.  Read data16 function makes it easier to chunk 16-bit color to/from register.  Added some missing registers and register descriptions.
set debug default to false
@DJDevon3
Copy link
Author

DJDevon3 commented Jun 2, 2024

Added read single pixel test. Reads 12 pixels from display and replicates the colors with filled rectangles. This is an example of a good test.

IMG_0009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants