Document performance caveats of RGB image formats versus RGBA #79771

Calinou · 2023-07-21T23:19:40Z

For context, @lyuma ran this benchmark on a 4096×4096 texture:

func _process(delta):
	var time_start: int = Time.get_ticks_usec()
	image_texture.update(image)
	var time_end: int = Time.get_ticks_usec()
	print(str(0.001 * (time_end - time_start)) + " ms")

These are the results they got depending on the Image format used:

Format	Time taken to update
RGBAF	100ms
RGBF	650ms
RGF	48ms
RF	21ms

RGBAH	48ms
RGBH	800ms
RGH	20ms
RH	10ms

RGBA8	21ms
RGB8	51ms
RG8	12ms
R8	6.0ms

Counterintuitively, the more memory-intensive RGBA formats always outperform the RGB format by a significant margin (by a 5× factor on average on this test). This is due to memory alignment, as RGB uses 3 components whereas other formats use 1, 2 or 4 components (powers of 2).

Considering how important the difference is, I think it's better to add notes everywhere it's relevant, even if it's a bit redundant.

See GDExtension: passing Image to ImageTexture with ImageTexture::update is too slow #76994.

torcado194 · 2023-07-22T07:35:31Z

Most of these changes say "Converting to [constant FORMAT_RGBA8] is slow, ... consider using [constant FORMAT_RGBA8]", instead of "[constant FORMAT_RGB8] is slow". Is this correct or am I misunderstanding?

aaronfranke · 2023-07-22T17:41:41Z

doc/classes/Image.xml

@@ -107,6 +107,7 @@
 			<param index="0" name="format" type="int" enum="Image.Format" />
 			<description>
 				Converts the image's format. See [enum Format] constants.
+				[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.


Suggested change

[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.

[b]Note:[/b] Converting to [constant FORMAT_RGB8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or for images without a blue channel, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] versus their RGBA/RG/R counterparts.

clayjohn · 2023-07-22T18:10:36Z

Before documenting this, I'd really like to see a performance analysis of what exactly is slow. I don't see an image conversion in the texture update codepath. We need to understand if the slowness is in the GPU driver (in which case we need to document it) or if the slowness is from something we are doing (in which case we may be able to fix the underlying issue)

bitsawer · 2023-07-22T19:00:52Z

@clayjohn I did some digging during #74238 when we noticed that sometimes custom image mipmaps were wiped out in the renderer/storage: #66848 (comment) There are a few places in texture storage where images are converted, for example this one (same also for FORMAT_RGBF and FORMAT_RGBH):

godot/servers/rendering/renderer_rd/storage_rd/texture_storage.cpp

Lines 1467 to 1481 in 1c1524a

    
           case Image::FORMAT_RGB8: { 
        
           	//this format is not mandatory for specification, check if supported first 
        
           	if (false && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_UNORM, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT) && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_SRGB, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT)) { 
        
           		r_format.format = RD::DATA_FORMAT_R8G8B8_UNORM; 
        
           		r_format.format_srgb = RD::DATA_FORMAT_R8G8B8_SRGB; 
        
           	} else { 
        
           		//not supported, reconvert 
        
           		r_format.format = RD::DATA_FORMAT_R8G8B8A8_UNORM; 
        
           		r_format.format_srgb = RD::DATA_FORMAT_R8G8B8A8_SRGB; 
        
           		image->convert(Image::FORMAT_RGBA8); 
        
           	} 
        
           	r_format.swizzle_r = RD::TEXTURE_SWIZZLE_R; 
        
           	r_format.swizzle_g = RD::TEXTURE_SWIZZLE_G; 
        
           	r_format.swizzle_b = RD::TEXTURE_SWIZZLE_B; 
        
           	r_format.swizzle_a = RD::TEXTURE_SWIZZLE_ONE;

And a few other places:

godot/servers/rendering/renderer_rd/storage_rd/texture_storage.cpp

Lines 1267 to 1271 in 1c1524a

    
           Ref<Image> image = Image::create_from_data(tex->width, tex->height, tex->mipmaps > 1, tex->validated_format, data); 
        
           ERR_FAIL_COND_V(image->is_empty(), Ref<Image>()); 
        
           if (tex->format != tex->validated_format) { 
        
           	image->convert(tex->format); 
        
           }

clayjohn · 2023-07-22T19:31:00Z

@bitsawer Thanks for pointing that out. Looks like the condition was set to always be false in 42b44f4. Although, it probably doesn't matter much as actual GPU support is really poor
(linear tiling)

I think we still need to check whether the call to image->convert is to blame though. If so, we can likely optimize that code

sakrel · 2023-07-22T21:13:45Z

This is a ~~frame~~ flame graph for RGB8 using the benchmark from the OP with a custom build:

6588a4a - Merge pull request #79661 from sepTN/fix-typo-batch
scons platform=windows arch=x86_64 verbose=yes warnings=no progress=no production=yes debug_symbols=yes target=editor
Found MSVC version 14.3, arch x86_64

RenderingDeviceVulkan:_texture_update spends most time in _copy_region

godot/drivers/vulkan/rendering_device_vulkan.cpp

Line 2598 in 6588a4a

_copy_region(read_ptr, write_ptr, x, y, region_w, region_h, width, pixel_size);

Image::convert

godot/core/io/image.cpp

Line 552 in 6588a4a

Image new_img(width, height, mipmaps, p_new_format);

godot/core/io/image.cpp

Line 640 in 6588a4a

_convert<3, false, 3, true, false, false>(mip_width, mip_height, rptr, wptr);

Something to note: When I run the benchmark with an official 4.1.1 build, a single iteration takes about 70 ms. With my custom build I get around 70 to 80 ms. I wonder if the difference can be explained by me using MSVC to compile the build? I copied the scons arguments from https://github.com/godotengine/godot-build-scripts/blob/main/build-windows/build.sh
Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

Calinou · 2023-07-23T16:56:50Z

I wonder if the difference can be explained by me using MSVC to compile the build? I

Official builds are compiled with a recent MinGW-GCC using optimize=speed and full LTO (lto=full). These options are aliased as production=yes. However, production=yes does not enable LTCG¹ on MSVC because it has known issues.

Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

Unfortunately, no. This is planned for a future release, but since we use MinGW, you'd have to convert the DWARF debug symbols to PDB format (there's a tool for that out there).

Link-time code generation (MSVC's LTO equivalent). ↩

Document performance caveats of RGB image formats versus RGBA

73c6b8d

Calinou requested a review from a team as a code owner July 21, 2023 23:19

Calinou added enhancement documentation cherrypick:4.0 cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release labels Jul 21, 2023

Calinou added this to the 4.2 milestone Jul 21, 2023

aaronfranke reviewed Jul 22, 2023

View reviewed changes

YuriSizov added discussion performance and removed cherrypick:4.0 labels Oct 30, 2023

YuriSizov modified the milestones: 4.2, 4.x Oct 30, 2023

YuriSizov removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document performance caveats of RGB image formats versus RGBA #79771

Document performance caveats of RGB image formats versus RGBA #79771

Calinou commented Jul 21, 2023 •

edited

Loading

torcado194 commented Jul 22, 2023 •

edited

Loading

aaronfranke Jul 22, 2023

clayjohn commented Jul 22, 2023

bitsawer commented Jul 22, 2023 •

edited

Loading

clayjohn commented Jul 22, 2023

sakrel commented Jul 22, 2023 •

edited

Loading

Calinou commented Jul 23, 2023 •

edited

Loading

	[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.
	[b]Note:[/b] Converting to [constant FORMAT_RGB8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or for images without a blue channel, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] versus their RGBA/RG/R counterparts.

Document performance caveats of RGB image formats versus RGBA #79771

Are you sure you want to change the base?

Document performance caveats of RGB image formats versus RGBA #79771

Conversation

Calinou commented Jul 21, 2023 • edited Loading

torcado194 commented Jul 22, 2023 • edited Loading

aaronfranke Jul 22, 2023

Choose a reason for hiding this comment

clayjohn commented Jul 22, 2023

bitsawer commented Jul 22, 2023 • edited Loading

clayjohn commented Jul 22, 2023

sakrel commented Jul 22, 2023 • edited Loading

Calinou commented Jul 23, 2023 • edited Loading

Footnotes

Calinou commented Jul 21, 2023 •

edited

Loading

torcado194 commented Jul 22, 2023 •

edited

Loading

bitsawer commented Jul 22, 2023 •

edited

Loading

sakrel commented Jul 22, 2023 •

edited

Loading

Calinou commented Jul 23, 2023 •

edited

Loading