Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified GLES3 / GLES2 Batching #42119

Merged
merged 2 commits into from
Oct 19, 2020
Merged

Unified GLES3 / GLES2 Batching #42119

merged 2 commits into from
Oct 19, 2020

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Sep 16, 2020

Very much a WIP, the new unified batching both provides acceleration for GLES3 and will be adding new features which will also improve GLES2 batching.

There will probably be issues with custom shaders to sort out, but it seems to run the platformer demo fine and a few test games and the editor.

Uses the curiously recurring template pattern to share code between the 2 backends.

Fixes #19943.
Fixes #37077.
Fixes #33225.
Fixes #42576.
Fixes #35772.
Fixes #36749.
Fixes #41324.
Mitigates #40905.

Benchmarking with the text benchmark I used for GLES2, I get 45x increase in fps, versus ~78x for GLES2. That is to be expected as the bottlenecks are slightly different in GLES3. This benchmark is specifically designed to look at drawcalls, real life increases are likely to be far less dramatic.

Expectations

  • As with GLES2 batching, any increases in performance will depend on how much your project is bottlenecked by drawcalls. Projects using tilemaps / lots of text will typically benefit the most. Custom shaders may or may not be able to be batched.
  • Normal mapping angles are now hopefully fixed, although regressions are expected. I have merged PR GLES2 2D fix normal mapping - batching and nvidia workaround #41323 into the batching.
  • Projects with lights, you may get a slowdown due to fill rate. Be sure to try the batching/light_scissoring option.
  • Batching settings are the same as for GLES2.
  • Speed may not be optimal yet, at first I will be trying to fix any visual regressions.
  • If you get decreased speed with batching, this is highly likely to be because batching defaults to single_rect_fallback to false. This is the old method of drawing quads which is about 2x as fast, but causes flicker on nvidia drivers. To directly compare the performance of both you would need to turn this back on.

New features

This isn't just a straight port of the feature set of GLES2 batching, I am now adding new features, which will be available in GLES2 and GLES3.

  • Light angles - for correct normal mapping angles and flips - DONE
  • Large vertex format - for accelerating more custom shaders - DONE - needs testing - build 0.13
  • More primitives - Ninepatch rects, lines, polys DONE
  • Software skinning path for 2d polys - DONE
    (software skinning is switchable in projectsettings->rendering->quality->2d)
  • Switchable ninepatch margin scaling method - DONE (except GLES3 legacy)
    (switchable in projectsettings->rendering->quality->2d)

Please read for more info

https://docs.godotengine.org/en/3.2/tutorials/optimization/batching.html#doc-batching
https://docs.godotengine.org/en/3.2/tutorials/optimization/batching.html#frequently-asked-questions

Builds

In progress builds for linux / windows. I'll try and update these as I go.
Please post any feedback regarding regressions, performance gains (or not) etc, with the build version you tried.

To find regressions, run while turning on flash_batching in project settings. The batching settings are exactly the same as GLES2. To test performance, turn off vsync, and choose to debug/settings/stdout/print_fps and run with batching switched off, then on.

  • 0.16 Fixed software skinning, ninepatch options
  • 0.15 Batching polys and software skinning path for polys
  • 0.14 Batching ninepatch rects and lines (single pixel width only)
  • 0.13 New vertex formats for accelerating custom shaders - highly likely to have bugs
  • 0.12 Light angles for normal mapping
  • 0.11 batching with items as well as commands - more likely to have regressions
  • 0.10 First version - batching with commands only

https://github.com/lawnjelly/Misc/releases

@lawnjelly
Copy link
Member Author

lawnjelly commented Oct 4, 2020

@clayjohn I've changed this from draft to ready for review.

There's probably quite a few things I've missed as far as naming etc but I'm keen for us to get this in as soon as possible, even if more work is expected. This is partly so we can get testing asap (as Akien said he is building 3.2.4 betas soon) but also because this is such a major change that it involves substantial work to manually rebase every time any other PR is added to the rendering.

Some things of note:

  • The common template to do the batching is quite large. For this reason and to prevent it becoming unwieldy I have split it into several inc files which are included from the main header. This is a trade off. It makes it super easy to navigate and find areas of the template concerned with different topics. On the other hand it breaks a lot of intellisense which is a pain.

I am open to moving them all into the same header, or a single hander and single inc, and separating the implementations and definitions if that is preferable. Or alternatively we could do such changes later if we thought it would work better in the long run.

  • There's a really tricky bug in the joining I haven't figured out yet which is meaning the first in a series of items often isn't getting joined. This doesn't result in visual errors, but it would be nice to fix because it could potentially lead to better performance.
  • I finally got around to solving the 'dummy' batches at the start of runs. This probably doesn't make that much difference but makes debugging and diagnostics easier.
  • Just give me a shout to give you a walkthrough as it is 4000 new lines of code, the process of integrating it is described in this video:
    https://www.youtube.com/watch?v=q-JIfjNnnMA

@clayjohn
Copy link
Member

clayjohn commented Oct 6, 2020

I've quickly reviewed the easy stuff, but Im gonna need a few more days to get through the difficult parts. So far it looks pretty good. Personally, I really like have things split across many files (even though some of those files are only 20 lines long). However, I know reduz really prefers long files that include all the needed functionality, so he will have to weigh in here. Either way it won't be too bad as most of those .inc files could be joined together without harming readability too much.

I'll let you know when I have a chance to do a more substantive review. The overall structure looks pretty similar to GLES2 batching, but I haven't really dug into how you are batching the other primitive types. So we'll see.

@lawnjelly
Copy link
Member Author

lawnjelly commented Oct 6, 2020

For batching the other types, the main conceptual change is that for the buffers, the start vertex and number of vertices (or commands) is stored in terms of vertex rather than quad (group of 4 verts) in the old version. This is because lines use 2 verts each, and polys can use a variable number of verts per poly.

What simplifies it considerably is that only similar primitives can be joined together at the moment. E.g. a group of rects are joined, and rendered as a group of rects. Poly group all polys etc.

This is something I'm meaning to improve later (probably after 3.2.4) so multiple types can be included in the same vertex buffer before a flush, it requires an extra layer of logic on top, but should result in a lot less VB uploads and probably better performance.

Ninepatches

Ninepatches are very simple, in that they are just translated into 9 rects and drawn as rects. In the version you have it only supports the GLES2 method, stretch mode, but I've been starting to look into whether it is possible to support the tiling modes as in GLES3. This seems to be a whole can o' worms, not in terms of technical difficulty, but in terms of agreement about how ninepatches should work.

I've opened up a new proposal to discuss this, so this PR doesn't become too off topic with what is really a side issue:
godotengine/godot-proposals#1618

Edit : It turns out the tilemode of ninepatch is relying on manual tiling of the central patch in the fragment shader, so can't be supported easily with batching. So I'm now thinking in terms of reverting to the legacy method if it uses tiling, and only batching non tiled ninepatches.

Tiling manually in the fragment shader is a really bad idea in GLES2 imo (same with non pot tiling), because on some hardware this ruins the texture prefetch and also drops the uv precision, leading to performance dropping off a cliff and visual anomalies. So I can see why this hasn't been done already. It could alternatively be done by creating a separate texture specifically for the central patch which could be tiled (and this would not require special shader and run better), but this would require some redesign in the visual server, and this is feature creep so I'm inclined to leave this.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through he whole thing, given the length of the PR, I can't say I spent a significant amount of time on each line of code. So I will mostly make general comments here:

  1. I like the overall structure of the implementation. It is a nice logical step up from the previous batching implementation.

  2. The PREAMBLE macro was a great way to simplify the templates.

  3. There is a lot of commented out code that has been left in throughout, I stopped commenting on them part way through. Before merging we will have to do a full read-through and make sure that we get read of the dead code.

Overall, great work! I am amazed at the size and complexity of this PR.

With just a little bit of code cleanup I think this will be ready for a testing build, we should do something similar to when we first added batching. @akien-mga can we get you to help put together another test batching build so we can get this into the hands of as many users as possible?

drivers/gles2/rasterizer_canvas_gles2.cpp Show resolved Hide resolved
drivers/gles2/rasterizer_canvas_gles2.cpp Outdated Show resolved Hide resolved
drivers/gles2/rasterizer_canvas_gles2.cpp Outdated Show resolved Hide resolved
drivers/gles2/rasterizer_canvas_gles2.cpp Outdated Show resolved Hide resolved
drivers/gles2/rasterizer_canvas_gles2.cpp Outdated Show resolved Hide resolved
drivers/gles2/rasterizer_canvas_gles2.cpp Outdated Show resolved Hide resolved
drivers/gles3/rasterizer_canvas_gles3.h Outdated Show resolved Hide resolved
@lawnjelly
Copy link
Member Author

lawnjelly commented Oct 15, 2020

I've cleaned up a number of the comments... 👍

Some I did miss but part of the reason for leaving others in is that we are expecting to have to tweak things as a result of the betas, and the commented code may help for changing these areas (and some are placeholders). Bear in mind we can also do a separate cleanup PR after the betas.

Edit: I've just also added a couple more notes to the documentation.

Project Settings

I've been a bit torn as to whether the 2d software skinning and ninepatch settings should be general settings (as they are, in rendering->quality) or go into the batching section. But maybe being torn between them means it doesn't really matter a great deal.

2d software skinning will likely only be supported in batching at least until 4.x, however, reduz expressed an interest in using 2d software skinning globally in 4.x (perhaps for simplicity as much as anything). Mind you in that case maybe we wouldn't need an option at all. And perhaps this is over future-proofing.

Ninepatch behaviour operates in batching and also in GLES2 legacy, but not in GLES3 legacy because it uses the shader (it is actually quite complex the situations where it can be used, because even in GLES3 batching with tiling, it reverts to the shader!). In 4.x however, it is possible that the ninepatch behaviour may be supported in vulkan. But again, maybe this is over future-proofing.

So I'm very happy to change their location, just let me know.

I've also noticed the rendering->quality section is growing, and really we could have done with an options section in addition. We could potentially put ninepatch and software skinning into an options section. There's a few other things like pixel snap which aren't really quality settings either (but then that would be compatibility breaking, which we don't want). Or we could just stick with it for now and consider changing these in 4.x.

Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.

Batching is extended to cover more primitives.
@lawnjelly
Copy link
Member Author

lawnjelly commented Oct 18, 2020

Added a second commit, doing some experiments to try and lower the amount of glBufferSubData calls in the legacy renderer in the hope of helping performance on devices with poor drivers.

Some more related info here, someone having similar problems:
https://twitter.com/FlohOfWoe/status/1317482268936540160

Areas in legacy renderer still expected to be problematic as far as orphaning (basically anywhere using glBufferSubData):

  • mesh_surface_update_region - LIMITED BY THE GODOT API
  • _update_skeleton_transform_buffer - DONE
  • canvas_light_occluder_set_polylines
  • update_dirty_multimeshes (GLES3) [possibly the particle problem?] - DONE

This is part of effort to make more efficient use of the API for devices with poor drivers. This eliminates multiple calls to glBufferSubData per update.
Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work as usual!

Let's get this merged and tested in 3.2.4 beta 1 (hopefully sometime this week).

@akien-mga akien-mga merged commit 123942f into godotengine:3.2 Oct 19, 2020
@akien-mga
Copy link
Member

Thanks!

@lawnjelly Do you want to prepare an issue with some testing instructions like we had for GLES2 batching, to collect user feedback?

@lawnjelly
Copy link
Member Author

Yup sure will do. 👍

@Valeryn4
Copy link
Contributor

Valeryn4 commented Sep 14, 2021

The problem is still relevant.ISSUE #35772
The error pops up on some devices.

Version 3.3
image
image

@lawnjelly lawnjelly deleted the ewok3 branch September 14, 2021 06:41
@Calinou
Copy link
Member

Calinou commented Sep 14, 2021

@Valeryn4 Please continue the discussion about this in #35772.

@oeleo1
Copy link

oeleo1 commented Jan 27, 2022

@lawnjelly Not sure where to write this quick note, so here :-) The RasterizerCanvasGLES2/3::try_join_item() function seems to have a useless reclip var and a dead code test on it later on which is probably a leftover from previous versions of the code that you may want to cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants