-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skeleton performance is low on GLES2 Android #37696
Comments
I think GLES2 uses a software path for skeletons (see define |
You should check if this is true or false on your hardware: godot/drivers/gles2/rasterizer_storage_gles2.cpp Lines 6022 to 6024 in 02ed72c
|
This issue makes an irony for me. |
Well the thing is that if your device supports GLES3, it should also support the GPU skeleton path on GLES2, unless for some weird reason the driver vendors decided not to implement the float texture support and vertex textures on GLES2 even though their GLES3 (and thus hardware) supports them. If it's not taking the software path though, that would be a bug as the GLES2 GPU path shouldn't be drastically slower than the GLES3 (I don't have specific knowledge about this though, this is just an expectation). If you are using the software path, we could still look into possible ways to optimize it for speed. |
I guess this is the case. |
Well it's not the same code on GLES2 and GLES3, so that doesn't tell us which code path it uses on GLES2. Please check this as I mentioned: #37696 (comment) You can add a |
here's my phone specs. and confirmed it uses skeleton software. in config.use_skeleton_software = (config.float_texture_supported == false) || (config.max_vertex_texture_image_units == 0);
if (config.use_skeleton_software) print_line("use_skeleton_software = true");
else print_line("use_skeleton_software = false");
|
The logic is correct, as without either float texture or vertex texture read the hardware path won't work. Looking at the phone specs it looks like it doesn't support float texture in GLES2. If I remember correctly the skeleton software path may be horribly inefficient, I'd not seen that approach before, and I suspect it was done because it was easier to retrofit to the existing pipeline rather than efficiency. Probably more standard hardware skinning (passing the matrices in array or uniform), or even software skinning would be faster. But they might be a bit more involved to fit into the existing framework. It might be worth adding both because some hardware has bugs about what hardware methods are supported, and software skinning will always be supported. |
Same problem here, im working with a scene with 10 characters with skeletons and the fps stay in 2 or 3 in GLES2, but runs normally on GLES3 at 60FPS. This frame rate makes the game unplayable. I would really like this problem to be solved asap because i intended to publish this game soon, but this incompatibility is becoming a huge obstacle for us, and i don't want to publish it with GLES3 because of a lot of others incompatibilities. This problem occurs on this device (Samsung Galaxy A8+): The problem occur on Sansumg Galaxy S8+ too, and did not occur on the Zenfone 5, Zenfone Selfie and Moto Z 2 Play that I tried. |
How many vertices per model out of interest? You may need to drop your vertex count - high vertex count skinned models is unlikely to work well with fallback methods. You could for example, ship 2 variations of skinned mesh, and switch at runtime depends on your frame rate. Or perhaps there is something else going on, depends on your models - 2,3 fps is quite low. @pouleyKetchoupp actually already wrote a software skinning implementation recently as part of: Which we suggested to reduz at the time might be of use for software skinning fallback, but he was against it, I can't find the irc logs. He may have believed it wouldn't be faster than the existing fallback. Irrespective, I'm aiming to experimentally try this out for 4.x as well as some alternate hardware skinning implementations. @endragor also did some earlier research in this area I believe. |
The software skinning I've implemented is currently limited to GLES3. It crashes with GLES2 on exported games, because godot/drivers/gles2/rasterizer_storage_gles2.cpp Lines 2671 to 2673 in 7f67674
I haven't investigated this problem yet, so I'm not sure how it works in the editor and if it would be possible to either make this functionality available for non-tool, or change the skinning implementation to update the vertex buffer in a different way. |
Ah I was hoping you'd got around that. 😄 That was the bit I was going to copy lol. Yeah if there's no support for dynamic VBs in GLES2 we will have to write for 4.x. |
@lawnjelly on blender: 2,155 vertices; on godot: 14,850 on GLES2 and 2,358 on GLES 3. I do not believe that it is a high quantity, the project is low poly by nature and I have had a lot of difficulty to reduce even that quantity, I do not know if it is possible to reduce it even more without losing a considerable level of quality. If i run the project with only one character on scene the fps is around 20, but with 2 characters it reduces to 3. The reduction occurs since the scene begins, not after some minutes as the original report of @volzhs, and if i put the object on scene without skeleton it runs at 60 FPS normally. After post i tried on a Sansumg Galaxy S9 with a Snapdragon and the problem does not occurs, apparently it's just happening with sansumg's Exynos Chipset. Prints of my animated model: |
@lawnjelly As an update, after checking again on 3.2 branch, dynamic VB is supported after #34794. I had made my original tests on a custom branch based on 3.1. So my code for software skinning can be used with GLES2. The error in non-tool builds is still there but it can be removed since retrieving mesh array data is actually supported. |
@Host32 Sorry I only just saw this. 14K skinned verts will indeed toast a lot of GLES2 devices, even best case. It is interesting the discrepancy between GLES2 and GLES3 (not sure how this is measured, debug monitor?). Are you using shadows? That could be causing problems too, each shadow might be causing another skinning pass (I haven't really examined this stuff yet, @clayjohn will know more). I would try turning shadows off to confirm this. One extra advantage of software skinning is that you can reuse the same skinned mesh for shadow passes. |
Retrieving mesh array data shouldn't be necessary for skinning. Maybe it is for historical reasons in the functions that are currently available for dynamic use. The relationship only needs to be one way. |
In my case with subdivision, I need to retrieve weights from the array data so I can apply skinning from the mesh directly without storing any extra information. But yeah, there are probably better ways to implement software skinning within the GLES2 code. |
I didn't see it in practice, all the devices i tested with 1GB of RAM and weak CPUs could run the project at 60FPS, only this specific chipset has problems with the animations.
I have no idea. I see that enabling "View Information" on the editor.
No, i tested it in every possitble way and got the same results. The only difference when enabling shadows is that performance also drops a little on GLES3, but nothing really significant. I did an extensive job to optmize the shaders and use as few passes as possible, so the rendering cost is very low. The only factor that is reducing performance to the point of making the game unplayable is the use of the skeletal animations in the characters. |
@pouleyKetchoupp what do i need to do to teste your algorithm? Can i compile the project from your fork or do i need to wait for these changes to be merged with the official branch? I'm in a bit of a hurry for this solution as it could compromise our launch schedule. |
906b5e7 it's already merged since stable-3.2. so there is if you using 3.2.x |
@Host32 @volzhs I've just made a quick implementation of the software skinning I'm using for subdivision in If you want to test it, you need to compile a custom version of the 3.2 branch including this commit. Make sure it's the latest 3.2 branch to get #40235 otherwise you'll be spammed with errors at runtime. Then you can just check "Software Skinning" property in a mesh instance to test it. |
I suspect that implementation could be sped up quite a bit too, but it would be interesting to see the comparison with the hardware method, I will try your version later. 👍 EDIT - Updated figures are in post below. Using OP's test project on desktop (Intel integrated GPU). Things get interesting once you start adding lights, presumably because the software skinning is a one off cost and can be reused for shadow passes. |
Ok I've now forced SKELETON_SOFTWARE path (which as I said is very inefficient). The results are very enlightening: RELEASE BUILDS
4 directional lights:Software Skinning 477fps 1 directional lightSoftware Skinning 550ps Lights offSoftware Skinning 580fps Software skinning is beating the current hardware fallback path by 2x, and by an increasing margin as lights are added. These lights are directional so may have splits so increase the advantage of the one off skinning. P.S. For anyone wanting to force USE_SKELETON_SOFTWARE to compare, add the test line here to rasterizer_storage_gles2.cpp, line 6061, in order to hard code it.
|
@pouleyKetchoupp wow. I just tested it with my project. |
@pouleyKetchoupp I found that all skeleton nodes play the same animation at the same time with duplicated instances when using software skinning. |
This is all adding up to be a convincing case to have this available in addition (and possibly replace in the long run) the USE_SKELETON_SOFTWARE path. I spoke to @clayjohn yesterday and he agreed it seemed convincing. We can try and bring this to @reduz attention on irc so we can all discuss it. |
I'm not sure about the history of USE_SKELETON_SOFTWARE, but indeed it seems somewhat pointless. A faster implementation (and the one we use) would be to store bone transforms in a uniform vector. The downside is that size of uniforms is limited, but even on older devices it allows about 75 bones per mesh, which is more than enough for most use cases on mobile. The uniform bone limit could be set as a project setting. |
Yup indeed I also understood this to be the most common method of skinning for GLES2 (and was thinking in terms of using this for the rewrite of GLES2 3d in 4.x, with software skinning fallback). |
Could you please share a minimal repro for this case? |
@pouleyKetchoupp - I've made a minor modification to the skinning code: No lights:Software skinning 199fps -> 580fps So now software skinning is twice as fast as the old fallback path, even with no lights, and should be even faster with lights. The modification was to add this at the start:
and change the per vertex to this:
(The aliasing isn't necessary of course). There's probably still quite a bit of gains to be got. I've also worked out why the cliff performance drop with lights, it was the shadow maps default to 4096 size. I'm going to rerun the tests with a 256 size shadow map and the improved skinning code, will update the earlier post - DONE. If anyone wants to test with these modifications, my branch is at: |
@pouleyKetchoupp Something I just noticed, we are not transforming normals I don't think? So it is not exactly like for like at the moment. For performance reasons in software skinning it can be nice to have the option to not transform normals, but it should be optional. This might be something that could benefit from a per mesh setting, as well as a global setting - you could e.g. not transform normals on enemies, but do transform on the main player. |
@lawnjelly I've just pushed a new version with prepared bone transforms and I've made it a draft PR to make it easier to test and make more changes if needed : #40313 Good point for the normals, and it sounds good to have it as an option. |
Fixed by #40313. |
Godot version:
3.2.2 02ed72c
OS/device including version:
Linux mint 19.3 / Galaxy S8+
Issue description:
FPS is dropped to around 40 when using skeleton on GLES2.
It's steady 60fps on GLES3 with same scene.
I tested it on GLES2 first and then GLES3 with the same amount of time.
GLES2
GLES3
(ignore the distorted mesh with bones)
Steps to reproduce:
Minimal reproduction project:
skeleton_performance_gles2.zip
The text was updated successfully, but these errors were encountered: