-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified GLES3 / GLES2 Batching #42119
Conversation
c2fff1a
to
f628be9
Compare
67082d0
to
a8eae83
Compare
@clayjohn I've changed this from draft to ready for review. There's probably quite a few things I've missed as far as naming etc but I'm keen for us to get this in as soon as possible, even if more work is expected. This is partly so we can get testing asap (as Akien said he is building 3.2.4 betas soon) but also because this is such a major change that it involves substantial work to manually rebase every time any other PR is added to the rendering. Some things of note:
I am open to moving them all into the same header, or a single hander and single inc, and separating the implementations and definitions if that is preferable. Or alternatively we could do such changes later if we thought it would work better in the long run.
|
02cb80c
to
bdb4014
Compare
I've quickly reviewed the easy stuff, but Im gonna need a few more days to get through the difficult parts. So far it looks pretty good. Personally, I really like have things split across many files (even though some of those files are only 20 lines long). However, I know reduz really prefers long files that include all the needed functionality, so he will have to weigh in here. Either way it won't be too bad as most of those I'll let you know when I have a chance to do a more substantive review. The overall structure looks pretty similar to GLES2 batching, but I haven't really dug into how you are batching the other primitive types. So we'll see. |
For batching the other types, the main conceptual change is that for the buffers, the start vertex and number of vertices (or commands) is stored in terms of vertex rather than quad (group of 4 verts) in the old version. This is because lines use 2 verts each, and polys can use a variable number of verts per poly. What simplifies it considerably is that only similar primitives can be joined together at the moment. E.g. a group of rects are joined, and rendered as a group of rects. Poly group all polys etc. This is something I'm meaning to improve later (probably after 3.2.4) so multiple types can be included in the same vertex buffer before a flush, it requires an extra layer of logic on top, but should result in a lot less VB uploads and probably better performance. NinepatchesNinepatches are very simple, in that they are just translated into 9 rects and drawn as rects. In the version you have it only supports the GLES2 method, stretch mode, but I've been starting to look into whether it is possible to support the tiling modes as in GLES3. This seems to be a whole can o' worms, not in terms of technical difficulty, but in terms of agreement about how ninepatches should work. I've opened up a new proposal to discuss this, so this PR doesn't become too off topic with what is really a side issue: Edit : It turns out the tilemode of ninepatch is relying on manual tiling of the central patch in the fragment shader, so can't be supported easily with batching. So I'm now thinking in terms of reverting to the legacy method if it uses tiling, and only batching non tiled ninepatches. Tiling manually in the fragment shader is a really bad idea in GLES2 imo (same with non pot tiling), because on some hardware this ruins the texture prefetch and also drops the uv precision, leading to performance dropping off a cliff and visual anomalies. So I can see why this hasn't been done already. It could alternatively be done by creating a separate texture specifically for the central patch which could be tiled (and this would not require special shader and run better), but this would require some redesign in the visual server, and this is feature creep so I'm inclined to leave this. |
fbd105a
to
1699879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through he whole thing, given the length of the PR, I can't say I spent a significant amount of time on each line of code. So I will mostly make general comments here:
-
I like the overall structure of the implementation. It is a nice logical step up from the previous batching implementation.
-
The
PREAMBLE
macro was a great way to simplify the templates. -
There is a lot of commented out code that has been left in throughout, I stopped commenting on them part way through. Before merging we will have to do a full read-through and make sure that we get read of the dead code.
Overall, great work! I am amazed at the size and complexity of this PR.
With just a little bit of code cleanup I think this will be ready for a testing build, we should do something similar to when we first added batching. @akien-mga can we get you to help put together another test batching build so we can get this into the hands of as many users as possible?
I've cleaned up a number of the comments... 👍 Some I did miss but part of the reason for leaving others in is that we are expecting to have to tweak things as a result of the betas, and the commented code may help for changing these areas (and some are placeholders). Bear in mind we can also do a separate cleanup PR after the betas. Edit: I've just also added a couple more notes to the documentation. Project SettingsI've been a bit torn as to whether the 2d software skinning and ninepatch settings should be general settings (as they are, in 2d software skinning will likely only be supported in batching at least until 4.x, however, reduz expressed an interest in using 2d software skinning globally in 4.x (perhaps for simplicity as much as anything). Mind you in that case maybe we wouldn't need an option at all. And perhaps this is over future-proofing. Ninepatch behaviour operates in batching and also in GLES2 legacy, but not in GLES3 legacy because it uses the shader (it is actually quite complex the situations where it can be used, because even in GLES3 batching with tiling, it reverts to the shader!). In 4.x however, it is possible that the ninepatch behaviour may be supported in vulkan. But again, maybe this is over future-proofing. So I'm very happy to change their location, just let me know. I've also noticed the |
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files. Batching is extended to cover more primitives.
Added a second commit, doing some experiments to try and lower the amount of glBufferSubData calls in the legacy renderer in the hope of helping performance on devices with poor drivers. Some more related info here, someone having similar problems: Areas in legacy renderer still expected to be problematic as far as orphaning (basically anywhere using glBufferSubData):
|
This is part of effort to make more efficient use of the API for devices with poor drivers. This eliminates multiple calls to glBufferSubData per update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work as usual!
Let's get this merged and tested in 3.2.4 beta 1 (hopefully sometime this week).
Thanks! @lawnjelly Do you want to prepare an issue with some testing instructions like we had for GLES2 batching, to collect user feedback? |
Yup sure will do. 👍 |
Unified GLES3 / GLES2 was merged in 3.2.4 beta (3.3 stable) godotengine/godot#42119
The problem is still relevant.ISSUE #35772 |
@lawnjelly Not sure where to write this quick note, so here :-) The |
Very much a WIP, the new unified batching both provides acceleration for GLES3 and will be adding new features which will also improve GLES2 batching.
There will probably be issues with custom shaders to sort out, but it seems to run the platformer demo fine and a few test games and the editor.
Uses the curiously recurring template pattern to share code between the 2 backends.
Fixes #19943.
Fixes #37077.
Fixes #33225.
Fixes #42576.
Fixes #35772.
Fixes #36749.
Fixes #41324.
Mitigates #40905.
Benchmarking with the text benchmark I used for GLES2, I get 45x increase in fps, versus ~78x for GLES2. That is to be expected as the bottlenecks are slightly different in GLES3. This benchmark is specifically designed to look at drawcalls, real life increases are likely to be far less dramatic.
Expectations
batching/light_scissoring
option.single_rect_fallback
to false. This is the old method of drawing quads which is about 2x as fast, but causes flicker on nvidia drivers. To directly compare the performance of both you would need to turn this back on.New features
This isn't just a straight port of the feature set of GLES2 batching, I am now adding new features, which will be available in GLES2 and GLES3.
(software skinning is switchable in projectsettings->rendering->quality->2d)
(switchable in projectsettings->rendering->quality->2d)
Please read for more info
https://docs.godotengine.org/en/3.2/tutorials/optimization/batching.html#doc-batching
https://docs.godotengine.org/en/3.2/tutorials/optimization/batching.html#frequently-asked-questions
Builds
In progress builds for linux / windows. I'll try and update these as I go.
Please post any feedback regarding regressions, performance gains (or not) etc, with the build version you tried.
To find regressions, run while turning on
flash_batching
in project settings. The batching settings are exactly the same as GLES2. To test performance, turn off vsync, and choose todebug/settings/stdout/print_fps
and run with batching switched off, then on.https://github.com/lawnjelly/Misc/releases