Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skeleton performance is low on GLES2 Android #37696

Closed
volzhs opened this issue Apr 8, 2020 · 33 comments
Closed

Skeleton performance is low on GLES2 Android #37696

volzhs opened this issue Apr 8, 2020 · 33 comments

Comments

@volzhs
Copy link
Contributor

volzhs commented Apr 8, 2020

Godot version:
3.2.2 02ed72c

OS/device including version:
Linux mint 19.3 / Galaxy S8+

Issue description:
FPS is dropped to around 40 when using skeleton on GLES2.
It's steady 60fps on GLES3 with same scene.
I tested it on GLES2 first and then GLES3 with the same amount of time.

Screenshot_20200409-001642_skeleton_performance_gles2
gles2_phone
Screenshot from 2020-04-09 00-42-57
GLES2

Screenshot_20200409-001907_skeleton_performance_gles2
image
Screenshot from 2020-04-09 00-47-04
GLES3

(ignore the distorted mesh with bones)

Steps to reproduce:

  1. Download attached project
  2. run on Android GLES2
  3. wait about 1 minute or little more and see FPS is getting to drop
  4. run on Android GLES3
  5. see pretty steady 60fps even after several minutes

Minimal reproduction project:
skeleton_performance_gles2.zip

@akien-mga
Copy link
Member

I think GLES2 uses a software path for skeletons (see define USE_SKELETON_SOFTWARE). I don't remember why but I guess it probably means that the GPU implementation requires features that are not available on all GLES2 devices.

@akien-mga
Copy link
Member

akien-mga commented Apr 8, 2020

You should check if this is true or false on your hardware:

// the use skeleton software path should be used if either float texture is not supported,
// OR max_vertex_texture_image_units is zero
config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);

@volzhs
Copy link
Contributor Author

volzhs commented Apr 8, 2020

This issue makes an irony for me.
I need GLES2 for Android game for supporting many devices, stability and performance.
GLES3 gives crashes on many devices, but better performance with Skeleton.
I can't choose either... 😢

@akien-mga
Copy link
Member

Well the thing is that if your device supports GLES3, it should also support the GPU skeleton path on GLES2, unless for some weird reason the driver vendors decided not to implement the float texture support and vertex textures on GLES2 even though their GLES3 (and thus hardware) supports them.

If it's not taking the software path though, that would be a bug as the GLES2 GPU path shouldn't be drastically slower than the GLES3 (I don't have specific knowledge about this though, this is just an expectation).

If you are using the software path, we could still look into possible ways to optimize it for speed.

@volzhs
Copy link
Contributor Author

volzhs commented Apr 8, 2020

the driver vendors decided not to implement the float texture support and vertex textures on GLES2 even though their GLES3 (and thus hardware) supports them

I guess this is the case.
according to test result in OP, it runs well on GLES3.

@akien-mga
Copy link
Member

Well it's not the same code on GLES2 and GLES3, so that doesn't tell us which code path it uses on GLES2.

Please check this as I mentioned: #37696 (comment)

You can add a print_line, or check your device specs with http://opengles.gpuinfo.org/ (there's an Android app to generate the report if your device is not already in the list).

@volzhs
Copy link
Contributor Author

volzhs commented Apr 8, 2020

here's my phone specs.
http://opengles.gpuinfo.org/displayreport.php?id=4476

and confirmed it uses skeleton software.

in godot/drivers/gles2/rasterizer_storage_gles2.cpp

config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);
if (config.use_skeleton_software) print_line("use_skeleton_software = true");
else print_line("use_skeleton_software = false");
--- Debugging process started ---
Godot Engine v3.2.2.rc.custom_build.97fe589ff - https://godotengine.org
OpenGL ES 2.0 Renderer: Mali-G71
use_skeleton_software = true

@lawnjelly
Copy link
Member

The logic is correct, as without either float texture or vertex texture read the hardware path won't work. Looking at the phone specs it looks like it doesn't support float texture in GLES2.

If I remember correctly the skeleton software path may be horribly inefficient, I'd not seen that approach before, and I suspect it was done because it was easier to retrofit to the existing pipeline rather than efficiency.

Probably more standard hardware skinning (passing the matrices in array or uniform), or even software skinning would be faster. But they might be a bit more involved to fit into the existing framework. It might be worth adding both because some hardware has bugs about what hardware methods are supported, and software skinning will always be supported.

@Calinou Calinou added this to the 4.0 milestone Apr 9, 2020
@Host32
Copy link

Host32 commented Jun 16, 2020

Same problem here, im working with a scene with 10 characters with skeletons and the fps stay in 2 or 3 in GLES2, but runs normally on GLES3 at 60FPS. This frame rate makes the game unplayable.

I would really like this problem to be solved asap because i intended to publish this game soon, but this incompatibility is becoming a huge obstacle for us, and i don't want to publish it with GLES3 because of a lot of others incompatibilities.

This problem occurs on this device (Samsung Galaxy A8+):

IMG-20200616-WA0020

The problem occur on Sansumg Galaxy S8+ too, and did not occur on the Zenfone 5, Zenfone Selfie and Moto Z 2 Play that I tried.

@lawnjelly
Copy link
Member

lawnjelly commented Jun 17, 2020

Same problem here, im working with a scene with 10 characters with skeletons and the fps stay in 2 or 3 in GLES2

How many vertices per model out of interest? You may need to drop your vertex count - high vertex count skinned models is unlikely to work well with fallback methods. You could for example, ship 2 variations of skinned mesh, and switch at runtime depends on your frame rate.

Or perhaps there is something else going on, depends on your models - 2,3 fps is quite low.

@pouleyKetchoupp actually already wrote a software skinning implementation recently as part of:
godotengine/godot-proposals#784

Which we suggested to reduz at the time might be of use for software skinning fallback, but he was against it, I can't find the irc logs. He may have believed it wouldn't be faster than the existing fallback. Irrespective, I'm aiming to experimentally try this out for 4.x as well as some alternate hardware skinning implementations. @endragor also did some earlier research in this area I believe.

@pouleyKetchoupp
Copy link
Contributor

@pouleyKetchoupp actually already wrote a software skinning implementation recently as part of:
godotengine/godot-proposals#784

The software skinning I've implemented is currently limited to GLES3. It crashes with GLES2 on exported games, because RasterizerStorageGLES2::mesh_surface_get_array is allowed only in tool.

#ifndef TOOLS_ENABLED
ERR_PRINT("OpenGL ES 2.0 does not allow retrieving mesh array data");
#endif

I haven't investigated this problem yet, so I'm not sure how it works in the editor and if it would be possible to either make this functionality available for non-tool, or change the skinning implementation to update the vertex buffer in a different way.

@lawnjelly
Copy link
Member

Ah I was hoping you'd got around that. 😄 That was the bit I was going to copy lol. Yeah if there's no support for dynamic VBs in GLES2 we will have to write for 4.x.

@Host32
Copy link

Host32 commented Jun 17, 2020

How many vertices per model out of interest? You may need to drop your vertex count - high vertex count skinned models is unlikely to work well with fallback methods. You could for example, ship 2 variations of skinned mesh, and switch at runtime depends on your frame rate.

@lawnjelly on blender: 2,155 vertices; on godot: 14,850 on GLES2 and 2,358 on GLES 3. I do not believe that it is a high quantity, the project is low poly by nature and I have had a lot of difficulty to reduce even that quantity, I do not know if it is possible to reduce it even more without losing a considerable level of quality.

If i run the project with only one character on scene the fps is around 20, but with 2 characters it reduces to 3. The reduction occurs since the scene begins, not after some minutes as the original report of @volzhs, and if i put the object on scene without skeleton it runs at 60 FPS normally.

After post i tried on a Sansumg Galaxy S9 with a Snapdragon and the problem does not occurs, apparently it's just happening with sansumg's Exynos Chipset.

Prints of my animated model:

image

image

image

@pouleyKetchoupp
Copy link
Contributor

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

@lawnjelly
Copy link
Member

@Host32 Sorry I only just saw this.

14K skinned verts will indeed toast a lot of GLES2 devices, even best case. It is interesting the discrepancy between GLES2 and GLES3 (not sure how this is measured, debug monitor?). Are you using shadows? That could be causing problems too, each shadow might be causing another skinning pass (I haven't really examined this stuff yet, @clayjohn will know more). I would try turning shadows off to confirm this.

One extra advantage of software skinning is that you can reuse the same skinned mesh for shadow passes.

@lawnjelly
Copy link
Member

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

Retrieving mesh array data shouldn't be necessary for skinning. Maybe it is for historical reasons in the functions that are currently available for dynamic use. The relationship only needs to be one way.

@pouleyKetchoupp
Copy link
Contributor

Retrieving mesh array data shouldn't be necessary for skinning. Maybe it is for historical reasons in the functions that are currently available for dynamic use. The relationship only needs to be one way.

In my case with subdivision, I need to retrieve weights from the array data so I can apply skinning from the mesh directly without storing any extra information. But yeah, there are probably better ways to implement software skinning within the GLES2 code.

@Host32
Copy link

Host32 commented Jul 9, 2020

@lawnjelly

14K skinned verts will indeed toast a lot of GLES2 devices, even best case.

I didn't see it in practice, all the devices i tested with 1GB of RAM and weak CPUs could run the project at 60FPS, only this specific chipset has problems with the animations.

not sure how this is measured, debug monitor?

I have no idea. I see that enabling "View Information" on the editor.

Are you using shadows?

No, i tested it in every possitble way and got the same results. The only difference when enabling shadows is that performance also drops a little on GLES3, but nothing really significant. I did an extensive job to optmize the shaders and use as few passes as possible, so the rendering cost is very low. The only factor that is reducing performance to the point of making the game unplayable is the use of the skeletal animations in the characters.

@Host32
Copy link

Host32 commented Jul 9, 2020

@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2.

The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported.

@pouleyKetchoupp what do i need to do to teste your algorithm? Can i compile the project from your fork or do i need to wait for these changes to be merged with the official branch? I'm in a bit of a hurry for this solution as it could compromise our launch schedule.

@volzhs
Copy link
Contributor Author

volzhs commented Jul 9, 2020

906b5e7 it's already merged since stable-3.2. so there is if you using 3.2.x
I also tested in a various way but the result is always same when using skeleton animation as @Host32 did.
GLES3 performs good but crashes on many devices, GLES2 performs badly on some devices but runs fine on every device.
I am eager this is solved anytime soon.

@pouleyKetchoupp
Copy link
Contributor

@Host32 @volzhs I've just made a quick implementation of the software skinning I'm using for subdivision in MeshInstance directly:
nekomatata@4bc6923

If you want to test it, you need to compile a custom version of the 3.2 branch including this commit. Make sure it's the latest 3.2 branch to get #40235 otherwise you'll be spammed with errors at runtime.
In order to test on Android, you also need to compile custom templates using the same changes.

Then you can just check "Software Skinning" property in a mesh instance to test it.

@lawnjelly
Copy link
Member

lawnjelly commented Jul 10, 2020

@Host32 @volzhs I've just made a quick implementation of the software skinning I'm using for subdivision in MeshInstance directly:
nekomatata@4bc6923

If you want to test it, you need to compile a custom version of the 3.2 branch including this commit. Make sure it's the latest 3.2 branch to get #40235 otherwise you'll be spammed with errors at runtime.
In order to test on Android, you also need to compile custom templates using the same changes.

Then you can just check "Software Skinning" property in a mesh instance to test it.

I suspect that implementation could be sped up quite a bit too, but it would be interesting to see the comparison with the hardware method, I will try your version later. 👍

EDIT - Updated figures are in post below. Using OP's test project on desktop (Intel integrated GPU).

Things get interesting once you start adding lights, presumably because the software skinning is a one off cost and can be reused for shadow passes.

@lawnjelly
Copy link
Member

lawnjelly commented Jul 10, 2020

Ok I've now forced SKELETON_SOFTWARE path (which as I said is very inefficient). The results are very enlightening:

RELEASE BUILDS

  • Using the OP's project, with joints hidden, just showing Beta_Surface2.
  • Screen is 320x240, shadow map 256x256 (in order to decrease fill rate effects as we are interested in vertex processing)
  • Intel Core i5-7500T 2.7ghz
  • Intel HD graphics 630
  • Software skinning has the modification I mentioned a few posts down.

4 directional lights:

Software Skinning 477fps
Hardware (SKELETON_SOFTWARE) 18fps
Hardware (Main method) 387fps

1 directional light

Software Skinning 550ps
Hardware (SKELETON_SOFTWARE) 62fps
Hardware (Main method) 1170fps

Lights off

Software Skinning 580fps
Hardware (SKELETON_SOFTWARE) 283fps
Hardware (Main method) 3055fps

Software skinning is beating the current hardware fallback path by 2x, and by an increasing margin as lights are added. These lights are directional so may have splits so increase the advantage of the one off skinning.

P.S. For anyone wanting to force USE_SKELETON_SOFTWARE to compare, add the test line here to rasterizer_storage_gles2.cpp, line 6061, in order to hard code it.

	// the use skeleton software path should be used if either float texture is not supported,
	// OR max_vertex_texture_image_units is zero
	config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);
	// test
	config.use_skeleton_software = true;

@volzhs
Copy link
Contributor Author

volzhs commented Jul 11, 2020

@pouleyKetchoupp wow. I just tested it with my project.
without software skinning, it runs 35~40 fps on Galaxy s8+
with software skinning, it's pretty stable 60 fps! yes!

@volzhs
Copy link
Contributor Author

volzhs commented Jul 11, 2020

@pouleyKetchoupp I found that all skeleton nodes play the same animation at the same time with duplicated instances when using software skinning.

@lawnjelly
Copy link
Member

@pouleyKetchoupp wow. I just tested it with my project.
without software skinning, it runs 35~40 fps on Galaxy s8+
with software skinning, it's pretty stable 60 fps! yes!

This is all adding up to be a convincing case to have this available in addition (and possibly replace in the long run) the USE_SKELETON_SOFTWARE path. I spoke to @clayjohn yesterday and he agreed it seemed convincing.

We can try and bring this to @reduz attention on irc so we can all discuss it.

@endragor
Copy link
Contributor

I'm not sure about the history of USE_SKELETON_SOFTWARE, but indeed it seems somewhat pointless. A faster implementation (and the one we use) would be to store bone transforms in a uniform vector. The downside is that size of uniforms is limited, but even on older devices it allows about 75 bones per mesh, which is more than enough for most use cases on mobile. The uniform bone limit could be set as a project setting.
This is also significantly faster than the "main method" where transforms are provided through a texture.

@lawnjelly
Copy link
Member

I'm not sure about the history of USE_SKELETON_SOFTWARE, but indeed it seems somewhat pointless. A faster implementation (and the one we use) would be to store bone transforms in a uniform vector. The downside is that size of uniforms is limited, but even on older devices it allows about 75 bones per mesh, which is more than enough for most use cases on mobile. The uniform bone limit could be set as a project setting.
This is also significantly faster than the "main method" where transforms are provided through a texture.

Yup indeed I also understood this to be the most common method of skinning for GLES2 (and was thinking in terms of using this for the rewrite of GLES2 3d in 4.x, with software skinning fallback).

@pouleyKetchoupp
Copy link
Contributor

@pouleyKetchoupp I found that all skeleton nodes play the same animation at the same time with duplicated instances when using software skinning.

Could you please share a minimal repro for this case?

@lawnjelly
Copy link
Member

lawnjelly commented Jul 12, 2020

@pouleyKetchoupp - I've made a minor modification to the skinning code:

No lights:

Software skinning 199fps -> 580fps
USE_SKELETON_SOFTWARE 285fps

So now software skinning is twice as fast as the old fallback path, even with no lights, and should be even faster with lights.

The modification was to add this at the start:

	// pre get bones
	int num_bones = visual_server->skeleton_get_bone_count(skeleton);
	const int SKIN_MAX_BONES = 128;
	Transform bone_transform[SKIN_MAX_BONES];
	for (int n=0; n<num_bones; n++)
	{
		bone_transform[n] = visual_server->skeleton_bone_get_transform(skeleton, n);
	}

and change the per vertex to this:

			int b0 = bone_id[0];
			int b1 = bone_id[1];
			int b2 = bone_id[2];
			int b3 = bone_id[3];

			Transform transform;
			transform.origin =
					bone_weight[0] * bone_transform[b0].origin +
					bone_weight[1] * bone_transform[b1].origin +
					bone_weight[2] * bone_transform[b2].origin +
					bone_weight[3] * bone_transform[b3].origin;

			transform.basis =
					bone_transform[b0].basis * bone_weight[0] +
					bone_transform[b1].basis * bone_weight[1] +
					bone_transform[b2].basis * bone_weight[2] +
					bone_transform[b3].basis * bone_weight[3];

(The aliasing isn't necessary of course).

There's probably still quite a bit of gains to be got.

I've also worked out why the cliff performance drop with lights, it was the shadow maps default to 4096 size. I'm going to rerun the tests with a 256 size shadow map and the improved skinning code, will update the earlier post - DONE.

If anyone wants to test with these modifications, my branch is at:
https://github.com/lawnjelly/godot/tree/soft_skin

@lawnjelly
Copy link
Member

@pouleyKetchoupp Something I just noticed, we are not transforming normals I don't think? So it is not exactly like for like at the moment.

For performance reasons in software skinning it can be nice to have the option to not transform normals, but it should be optional. This might be something that could benefit from a per mesh setting, as well as a global setting - you could e.g. not transform normals on enemies, but do transform on the main player.

@pouleyKetchoupp
Copy link
Contributor

@lawnjelly I've just pushed a new version with prepared bone transforms and I've made it a draft PR to make it easier to test and make more changes if needed : #40313

Good point for the normals, and it sounds good to have it as an option.

@akien-mga
Copy link
Member

Fixed by #40313.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants