-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocket Bot Royale - Post-mortem discussion and engine diffs. #62291
Comments
I've put up our core engine diff (excluding some custom networking modules) here: winterpixelgames@55344ea The last upstream merge we've brought into our fork was in 3.4.1.rc1, which was 489b49c (for context). The format of this issue isn't for anything specific per se. It's intention is more to provide the community & godot project team a port-mortem analysis for this project. As well, hopefully to describe & discuss the engine limitations we've run into during development and how and where we've had to modify the engine to address any issues that we've run into. Perhaps someone as well may find any of the changes in the diffs below useful. General Impressions & Project Scope:In short, we've generally enjoyed using the engine for this project. I would consider the 'lean-ness' of the engine a competitive advantage in the market right now with alternative engines being quite bulky and bloaty. Even though godot is relatively lean, we didn't really find a core 'feature' that was missing from the engine that would have been considered a show-stopper for our project. Generally speaking, anything that was missing was just implemented or worked around. The larger items were probably the networking & physics systems. We found out relatively quickly that the architecture in godot regarding those two components would not work for the client-predicted/rollback network model we wished to achieve. We worked around these limitations by using a Box2D native module for physics, and building our own networked native module. But for the most part, hurdles working with the engine were mainly related to a few areas: Engine ergonomics & architecture, performance, robustness, and interoperability with existing tooling. Particularly wrt debugging, instrumentation, and profiling. Diff Notes:Collection comparator of ==operator by value Godot exposed (not sure if this is the case anymore) Memory allocation instrumentation & monitor We found memory operation overhead to be one of the main issues with the engine's performance. We added an allocation monitor to track number of allocations per frame, and time spent allocating and freeing memory per frame (time spent computation only compiled in debug builds as the clock sampling itself it is a performance hit). Added different keyboard types on iOS | Keyboard types Additional support for different keyboard types on iOS. Bitarray packing encoded types and compression levels These piggybacked on the work by @AndreaCatania's scene replicator PR, in which we efficiently encode different types with 5 different bit-level compression settings. From 1-bit encoded types up to 64-bit. We added these as gdscript/property annotations to easily express how many bits to use for value encoding or properties in our network module. DNS resolver runs out of queries There's a bug in the DNS resolver and it currently runs out of queries and can get hung. We're just working around this right now by bumping this number and punting the issue further down the session. The godot Socket API for multiple messages does superfluous memcpy's Both POSIX and win32 have an API for sending multiple buffers at one. Potential infinite loop/deadlock fix We experienced a deadlock here, this one was quite tough to find and debug. Avoid deadlocks in HTML5 threaded exports HTML5 threaded exports are susceptible to emscripten deadlocks when not using Thread changes to eliminate data race condition These changes in thread intialization eliminates a data race condition that was found using TSAN. Exposing thread current_id to workaround overused singletons in the codebase This can be explained later on with a dummy visual server implementation, and ability to bypass a singleton imple by thread id. Disables screen texture mipmap generation This was needed to shave 1.5ms off of rendering on iOS devices, when the effects were were using only needed to sample the screen texture, not the generate the whole screen mipmap chain. This disabled that generation and saved 10-15% render time on iOS mobile test devices. Thread safe script language server Made this thread safe. Stringname has a double-free vuln Unref check is not atomic, leads to a double free. Debugging gdscript errors in release is HARD, and there is not enough information We added a variety of checks in the gdscript runtime to print out more helpful runtime errors in gdscript in release builds. The macro Textures left bound break WebGL in weird ways We fixed a number of errors with resources still being bound and WebGL complaining about things and throwing errors. Particularly, there were issues with particles. This change unbinds these resources and eliminates some of those issues on that platform. Enet changes for proper unreliable transport This eventually was stripped out from another PR and merged in #53129 Modified gdscript parser to annotate network synced properties in gdscript More of a custom use for our particular use case of gdscript. Decode once audio streams This ended up being a REALLY important optimization. To keep binary size down, we're using ogg/vorbis compression on all sfx and music. During performance testing (especially on web), we saw a ton of time spent decoding audio files. Not only that, but scenes that were instantiated with audio resources, (and not ever used), paid a huge performance penalty because a bunch of memory buffer's are allocated on instantiation if an audio stream player is instantiated with any decoded audio resource. Say something like this: When profiling the instantiation of the scene above, almost all of the cpu instantiation time is spent in the ogg decoder of each of those streams, reading and setting up some initial decode buffers for those streams (even if they are never played). This ended up being a really big hiccup. Not only that, but we were paying the cpu decode cost of decoding these formats everytime they were played, which when we profiled was really significant. To address this, we implemented a 'decode once' audio resource import parameter, in which an encoded audio resource is only ever decoded once to PCM. Subsequent uses of the audio resource then stream the simple decoded PCM format, which uses much less cpu resource. The trade off here is runtime cpu for memory footprint: This ended up being an almost 3X saving in audio processing cpu time on web, and also eliminated many scene instantiation hiccups where the scene included audio stream nodes which were using encoded audio formats. Fixes blocking on the main thread when using websockets Websockets end up being more like httprequest (non blocking on main godot thread). Flags for debug symbols included in Android builds pre gradle strip Android/cmake/gradle does a strip in a cmake build artifact later on in the android build process. Compiling with these symbols included allows us to inspect the pre-strip artifact folder which can be uploaded to google play for symbolicated native stack traces. Likewise on iOS Various changes to emscripten settings Various emscripten tweaks here based off of performance testing in a variety of browsers. Actually, many javascript changes and improvements in the javascript changes there and subsequent files. Cross Codesign macOS Most of these changes were just pulling in @bruvzg 's improvements to code-signing macOS and iOS on different platforms. 'Relative' particles 2D mode, in addition to existing 'local' and 'world' This is a common particles pattern where the particle positions are relative to their parent. (Not just local or world options). ScrollContainer sensitivity and damping A lot of issues between mouse, scrollwheel, and trackpads here. Lastly, we added a dummy visual server. The reason for the above is the visual server is not thread safe, and we run our netcode rollback (reset to a past state, and replay the game in the background to catch up to realtime) in worker threads. That means resetting scene trees, state, properties, nodes, etc, and running all the logic that has run in the past in a seperate simulation which is then fast-forward simulated back to the current predicted state. For game logic to rerun, that means instantiating scenes, firing the same signals, literally running the same codepaths. For that we use a shadow |
Thanks hugely for the write up and taking the time and work of putting up the diff. I am personally happy that Godot was useful for developing and publishing Rocket Bot Royale and I'm sure that a lot of the changes you had to do will be invaluable in improving the code quality! I think some of these issues have been significantly improved in master branch, but I am super curious to see what your solutions (and problems found) were in order to keep improving the codebase. |
Thanks for the detailed writeup, it's very informative 🙂 I remember the memory instrumentation profiler is something @lawnjelly was interested in having. Maybe he could help get it merged?
For reference, similar functionality was requested in godotengine/godot-proposals#1144. That said, I wonder if you could sidestep the problem by instancing AudioStreamPlayer2D nodes only when needed, and free them once they're done playing. This can be done with a single node for multiple sounds in 4.0 thanks to polyphony support. |
I randomly stumbled upon this when searching for another issue, and I just wanted to say the decode once for ogg vorbis is a brilliant idea. Would love to see that in the core engine. I wondered if anybody created a proposal for that and then I see Calinou linked to the one I created, which I completely forgot about. Boy, my memory is terrible. 😅 |
Godot version
3.4
System information
HTML5, Windows, Linux, MacOS, iOS, Android, Linux Server
Issue description
Post mortem for Rocket Bot Royale. A cross-platform, multiplayer, server-authoritative, battle-royale game. https://rocketbotroyale.winterpixel.io
Steps to reproduce
N/A - see first comment.
Minimal reproduction project
N/A - see first comment.
The text was updated successfully, but these errors were encountered: