Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document performance caveats of RGB image formats versus RGBA #79771

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Calinou
Copy link
Member

@Calinou Calinou commented Jul 21, 2023

For context, @lyuma ran this benchmark on a 4096×4096 texture:

func _process(delta):
	var time_start: int = Time.get_ticks_usec()
	image_texture.update(image)
	var time_end: int = Time.get_ticks_usec()
	print(str(0.001 * (time_end - time_start)) + " ms")

These are the results they got depending on the Image format used:

Format   Time taken to update
RGBAF 100ms
RGBF 650ms
RGF 48ms
RF 21ms
RGBAH 48ms
RGBH 800ms
RGH 20ms
RH 10ms
RGBA8 21ms
RGB8 51ms
RG8 12ms
R8 6.0ms

Counterintuitively, the more memory-intensive RGBA formats always outperform the RGB format by a significant margin (by a 5× factor on average on this test). This is due to memory alignment, as RGB uses 3 components whereas other formats use 1, 2 or 4 components (powers of 2).

Considering how important the difference is, I think it's better to add notes everywhere it's relevant, even if it's a bit redundant.

@Calinou Calinou requested a review from a team as a code owner July 21, 2023 23:19
@Calinou Calinou added enhancement documentation cherrypick:4.0 cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release labels Jul 21, 2023
@Calinou Calinou added this to the 4.2 milestone Jul 21, 2023
@torcado194
Copy link

torcado194 commented Jul 22, 2023

Most of these changes say "Converting to [constant FORMAT_RGBA8] is slow, ... consider using [constant FORMAT_RGBA8]", instead of "[constant FORMAT_RGB8] is slow". Is this correct or am I misunderstanding?

@@ -107,6 +107,7 @@
<param index="0" name="format" type="int" enum="Image.Format" />
<description>
Converts the image's format. See [enum Format] constants.
[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.
[b]Note:[/b] Converting to [constant FORMAT_RGB8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or for images without a blue channel, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] versus their RGBA/RG/R counterparts.

@clayjohn
Copy link
Member

Before documenting this, I'd really like to see a performance analysis of what exactly is slow. I don't see an image conversion in the texture update codepath. We need to understand if the slowness is in the GPU driver (in which case we need to document it) or if the slowness is from something we are doing (in which case we may be able to fix the underlying issue)

@bitsawer
Copy link
Member

bitsawer commented Jul 22, 2023

@clayjohn I did some digging during #74238 when we noticed that sometimes custom image mipmaps were wiped out in the renderer/storage: #66848 (comment) There are a few places in texture storage where images are converted, for example this one (same also for FORMAT_RGBF and FORMAT_RGBH):

case Image::FORMAT_RGB8: {
//this format is not mandatory for specification, check if supported first
if (false && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_UNORM, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT) && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_SRGB, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT)) {
r_format.format = RD::DATA_FORMAT_R8G8B8_UNORM;
r_format.format_srgb = RD::DATA_FORMAT_R8G8B8_SRGB;
} else {
//not supported, reconvert
r_format.format = RD::DATA_FORMAT_R8G8B8A8_UNORM;
r_format.format_srgb = RD::DATA_FORMAT_R8G8B8A8_SRGB;
image->convert(Image::FORMAT_RGBA8);
}
r_format.swizzle_r = RD::TEXTURE_SWIZZLE_R;
r_format.swizzle_g = RD::TEXTURE_SWIZZLE_G;
r_format.swizzle_b = RD::TEXTURE_SWIZZLE_B;
r_format.swizzle_a = RD::TEXTURE_SWIZZLE_ONE;

And a few other places:

Ref<Image> image = Image::create_from_data(tex->width, tex->height, tex->mipmaps > 1, tex->validated_format, data);
ERR_FAIL_COND_V(image->is_empty(), Ref<Image>());
if (tex->format != tex->validated_format) {
image->convert(tex->format);
}

@clayjohn
Copy link
Member

@bitsawer Thanks for pointing that out. Looks like the condition was set to always be false in 42b44f4. Although, it probably doesn't matter much as actual GPU support is really poor
image (linear tiling)

I think we still need to check whether the call to image->convert is to blame though. If so, we can likely optimize that code

@sakrel
Copy link
Contributor

sakrel commented Jul 22, 2023

This is a frame flame graph for RGB8 using the benchmark from the OP with a custom build:

6588a4a - Merge pull request #79661 from sepTN/fix-typo-batch
scons platform=windows arch=x86_64 verbose=yes warnings=no progress=no production=yes debug_symbols=yes target=editor
Found MSVC version 14.3, arch x86_64

image
image

RenderingDeviceVulkan:_texture_update spends most time in _copy_region
image

_copy_region(read_ptr, write_ptr, x, y, region_w, region_h, width, pixel_size);

Image::convert
image
Image new_img(width, height, mipmaps, p_new_format);

image
_convert<3, false, 3, true, false, false>(mip_width, mip_height, rptr, wptr);

Something to note: When I run the benchmark with an official 4.1.1 build, a single iteration takes about 70 ms. With my custom build I get around 70 to 80 ms. I wonder if the difference can be explained by me using MSVC to compile the build? I copied the scons arguments from https://github.com/godotengine/godot-build-scripts/blob/main/build-windows/build.sh
Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

@Calinou
Copy link
Member Author

Calinou commented Jul 23, 2023

I wonder if the difference can be explained by me using MSVC to compile the build? I

Official builds are compiled with a recent MinGW-GCC using optimize=speed and full LTO (lto=full). These options are aliased as production=yes. However, production=yes does not enable LTCG1 on MSVC because it has known issues.

Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

Unfortunately, no. This is planned for a future release, but since we use MinGW, you'd have to convert the DWARF debug symbols to PDB format (there's a tool for that out there).

Footnotes

  1. Link-time code generation (MSVC's LTO equivalent).

@YuriSizov YuriSizov modified the milestones: 4.2, 4.x Oct 30, 2023
@YuriSizov YuriSizov removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants