-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDScript vs C# performance #36060
Comments
I am kind of curious why GDScript would be that much slower. Isn't it converted into bytecode? |
@nathanfranke I don't remember if bytecode compilation is still present. If it is, keep in mind it only speeds up how fast the script loads, not how fast it runs (just like in Python). |
I've been doing GDScript microbenchmarks for a while: https://github.com/Zylann/gdscript_performance @nathanfranke C# uses JIT to convert its bytecode into CPU instructions and is able to use |
Bad to hear. I really like GDScript. |
As I understand, GDScript is not intended for use in CPU-bound situations, it's just a convenient language for beginners that's tightly coupled with the engine specifics. Most games' CPU-bound use-cases are intended to be covered by specific nodes. For CPU-bound situations you can use C#, GDNative or write a module. |
Agreed, though it is interesting topic. |
I've got a Godot module implementation of the algorithm available here: https://github.com/dragmz/gdlifemod (tested with MSVC 2019 only) |
C++ is theoretically as fast as it can possibly be. Unless you want to write assembly :D Seriously though, C# is pretty fast as long as you're not memory-bound. Otherwise you get the same problem as Java: lag spikes on garbage collection. |
A bit off topic, but... due to the sheer level of sophistication of compilers, it is extremely likely that C++ will outperform handwritten assembly. For trivial cases, the compiler has those cases optimized. For non-trivial cases, the problems are likely too complex for a human to hand-optimize the assembly. Back on topic, here are some more benchmarks: http://www.royaldonut.games/2019/03/29/cpu-voxel-benchmarks-of-most-popular-languages-in-godot/ Note that voxel generation is a use case which can be given many low-level optimizations, so C# and especially C++ benefit a lot here. |
Please don't let specific benchmarks like this change your decision on GDScript :) GDScript is not just for beginners, and it's certainly not slow (relative to a smooth gaming experience). There are trade-offs with any language. The sheer power and rapid prototyping GDScript offers, while giving the developer the chance to build their dream game, should not be overseen. |
Yes, I'm will use it :) I think that if necessary, I will rewrite the bottlenecks in C++. |
Note that GDScript uses real Godot Objects and the ObjectDB for managing them when creating sub-classes in a script. Also it will create ScriptInstances for each of them, and make every variable access on such a script go through a few levels of indirection (thankfully, it will not do string comparisons, but hashmap and red-black tree lookups are enough). |
@bojidar-bg both implementations are intentionally naive to mimic what a beginner would do |
I accept that c# is more efficient than gdscript. But the gaming coding is not necessarily what causes performance issues. The performance of game is more likely to decide by the parts of the game that the engine is already handling. The average user case does not include rendering 10k sprites or anything like that. |
There are a number of performance improvements that can be made for GDScript, first and foremost by implementing some basic back-end optimization to the byte code. As of version 3.2 there is no optimization of the byte code beyond some basic constant folding that happens in the compiler front end (that I have observed, at least). Because GDScript VM executes code in a loop containing a giant switch, reducing the number of byte code instructions generated will go a long way to improving the performance. I have a fork in the works that I've begun implementing the following optimizations of the GDScript byte code:
Potential:
Some of these optimizations would affect the semantics of the running application if GDScript is being run in multiple threads. Optimizations that store into typed temporaries in the virtual registers will not be visible externally until the values are stored, which may not be guaranteed until the exit of the function. C# has the volatile qualifier to tell the optimizer to always load the value before it's read and commit it to memory after it is modified. I don't think we'd want to add this level of additional level of complexity to GDScript, so the optimizations will need to be very conservative when they could introduce concurrency issues. Even with all of these optimizations in place, there is very little chance that GDScript will perform as well as C#. Sure, you could build a JIT for GDScript, but I think it would be easier just to implement GDScript as a .NET language and use the mono tooling that is already in place. |
Out of curiosity, do you know how such a thing would be done? It sounds like a fun experiment. |
It's not something I've ever done in .NET, but it has been done. You would need to write your own front end to parse the GDScript, then transform that code to CIL instructions to build an assembly. Theoretically, at this point the assembly would be loaded and run no differently than one built with C#. Obviously it is easier said than done, but I think this solution would be easier than rolling your own JIT. Another approach is to compile GDScript to LLVM IL. I'm not sure which approach would be easier, but I suspect building the GDScript language for .NET would be as the mono integration is already in place. A good place to start would be to have a look at the Boo source code. Boo is a python-like .NET language. It used to be an option for scripting in Unity, but support has been dropped long ago. Since then people have mostly seemed to have lost interest, but you may be able to use it as a jumping-off point. |
If we're going to use a target for GDScript, it should probably be LLVM.
Dotnet is a bit big to bundle when targeting mobile phones, I think.
…On Mon, Feb 17, 2020, 23:58 pchasco ***@***.***> wrote:
Out of curiosity, do you know how such a thing would be done?
It's not something I've ever done in .NET, but it has been done. You would
need to write your own front end to parse the GDScript, then transform that
code to CIL instructions to build an assembly. Theoretically, at this point
the assembly would be loaded and run no differently than one built with C#.
Obviously it is easier said than done, but I think this solution would be
easier than rolling your own JIT. Another approach is to compile GDScript
to LLVM IL. I'm not sure which approach would be easier, but I suspect
building the GDScript language for .NET would be as the mono integration is
already in place.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#36060?email_source=notifications&email_token=ABQ3BPLKFZXLCSCAVZI237DRDNFFXA5CNFSM4KSEI67KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMALUEI#issuecomment-587250193>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQ3BPIWUKAMLH3D6LWU5K3RDNFFXANCNFSM4KSEI67A>
.
|
Possibly... but the team has already decided that dotnet will be supported, and that support is implemented. |
@pchasco Mono support is optional and the engine is officially and primarily compiled without it, since it about doubles the size of the engine to have Mono support. |
That is true. If I am not mistaken LLVM has been discussed in the past and was decided against because it adds a rather sizable build dependency. Personally, I think the best approach is to make GDScript as fast as it can be without going to JIT. I have done a significant amount of work on a GDScript to C compiler, but I have largely abandoned that effort in favor of improvements to the VM. After some initially encouraging results, real-world performance wasn't that much improved over GDScript to make it worth my while. The compiler targeted GDNative, which was the primary performance bottleneck. I could have it generate an engine module to avoid GDNative, but then it would be much less convenient to build. |
Note that GDScript performance improvements are planned, as discussed in a google doc. Just a few cents on ideas listed previously:
We were thinking of using SSA (single static assignment) in the compiler, which might alleviate this issue.
Definitely sounds useful, and I can see how the compiler might sometimes generate bytecode which chains jumps needlessly.
We were thinking of going for typed instructions instead. They will still use Variant, but they will not check its type, or unbox it, but will just take the bytes out.
Not sure about this one either. Maybe there could be some instructions which do a ptrcall using a pointer stored within the bytecode directly?
As already mentioned, non-mono builds of Godot should have working GDScript.
I agree. Going to JIT can only eliminate the time spent dispatching instructions. As currently implemented by #11518, the dispatching code is pretty fast, and can even be branch predicted by the CPU. (It is not a giant switch in a loop, except on compilers which do not support jumptables.) |
@bojidar-bg Thanks for the share! I've been trying to find a summary of the planned improvements on github but I haven't had much luck to this point.
You could go that way... But I think it's overkill. Deconstructing SSA form correctly is not trivial as it can introduce hard-to-find bugs in the code when not done correctly. You can still implement dead code elimination, constant folding, common subexpression elimination, and most other beneficial optimizations without it.
I suppose this could work... In my fork where I have been trying some of these things out I went direction of registers. Mostly because I did not want to suggest breaking encapsulation of the variant structure to work in its data directly. Getting directly into the variant data might be a bit faster because it would avoid the load/store operations for the registers.
Technically true but I wonder if this actually happens in practice. Edit: Edit 2: |
I have a partial implementation of some of these optimizations in my fork: https://github.com/pchasco/godot Notably missing is support for the iterate and yield instructions, and functions with default arguments. Also incomplete is typed instructions. This is published now as only a PoC. In the next week or so I will implement the missing instructions. There is no challenge there beyond finding the time; there is no significant difference to them versus the standard jump instructions. Typed instructions will take somewhat longer. I also plan to implement optimization for built-ins that are pure functions. Calls to pure functions are candidates for elimination via common sub expression elimination optimization. |
On the other hand, I know Python and I know C# but I still need to learn GDScript almost from scratch. While it's not that hard, it messes up with my brain when I come back to regular python development at work. It's similar to python in some cases, but it's not python. |
How small is LLVM? As far as I can tell it'd still more than double the current download size. But honestly, going from 20MB to 50MB isn't exactly much. Sure, a 250% increase in download size sounds huge, but in context of modern day phones, computers, and available bandwidth it is absurdly tiny. |
I was talking about the export's size, not the editor's. Games that use C# need to ship the virtual machine too on targets that don't have dotnet. LLVM outputs native binary so it becomes a non-issue. For what it's worth, I'd also make it optinal, like the Mono version is. |
Most users don't compile Godot from source, though. If somebody figures out the whole LLVM compilation issue and adds the source code to the repo (like we do with basically every external lib), all the end user will have to do is download the proper version from the website. It would also be possible to implement LLVM support through an addon that outputs GDNative binaries, so this wouldn't necessarily need to be baked into the engine. |
@jabcross The LLVM source code repository weighs dozens of gigabytes. It'll never be added to the Godot source repository at this point 😛
This is arguably a better course of action. I think we should focus on improving GDNative integration instead (and possibly add WebAssembly support as a replacement in the long-term, but that's another discussion). |
It would be awesome. Currently GDNative is very inconvenient. |
I'm not 100% updated on this, but the problem with GDNative currently is the interface bottleneck, right? Function calls still need string comparison and whatnot?
Yeah, but a considerable portion is non-essential, like tests and benchmarks. It could be trimmed down. MLIR is also in the LLVM repository, after all. We should also consider MLIR in the future, it's a really cool project. |
That's not entirely correct. You can basically acquire a handle to a script
method and use ptrcall to invoke it, eliding any lookup.
…On Wed, May 13, 2020 at 9:04 AM Pedro Ciambra ***@***.***> wrote:
I'm not 100% updated on this, but the problem with GDNative currently is
the interface bottleneck, right? Function calls still need string
comparison and whatnot?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36060 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAUFIAE22AFHMMFFQEMIFGTRRKSH7ANCNFSM4KSEI67A>
.
|
Still, lots of hoop jumping for something arguably simple. It happens with variable assignment too, right? Are there any known estabilished games that used GDNative so far? |
I don't think it's a lot of hoops. And I wouldn't recommend using ptrcall for every method; premature optimization adages still apply.
Karroffel mentioned one on my blog but he didn't give the name of it... |
What is the current state of GDScriptt right now and could you guys make some performance increase, be it GDNative and GDScript? |
GDScript is being remade by vnen which will hopefully improve performance.
GDNative is a completely different language so I don't understand the second part |
You can likely achieve performance increase by porting your GDScript to compiled GDNative scripts. I am not confident that the currently proposed changes being made to GDScript will significantly improve performance, but time will tell. |
Out of curiosity, accordingly to Godot documentation:
It has an excellent tradeoff between performance and ease of use, although one must be aware of its garbage collector. I'm curious what manual care I'd have to have when coding with C#. Is there any recommendation to where and when I should manually call garbage collection operations in C# scripts? |
I don’t know why you would need to manually force a collection. I definitely would not do it during gameplay. The only time when it may be beneficial to do a manual collection would be after loading or saving a game, or after dismissing a menu when you can get away with some stutter. But you can generally just ignore the GC altogether if your game is performing well. |
Thanks! Well, truth be told: the way putted out in the documentation seemed there should be some sort of handling or expectations for memory leak if things are kept 'automatically'. All in all, I was just curious whether instantiations and real time calculations could leave garbage behind without any proper manual handling. I'll query about those on Reddit. Thanks! |
There is new kid on the block: MIR https://github.com/vnmakarov/mir that is aiming to be lightweight JIT for multiple intermediate representations https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/ and initially being implemented as JIT for Ruby 3.0. |
@milkowski so would u then use GDScript or GDNative code in combination with this new M-JIT? |
Sorry to barge in like this but I'd like to ask what performance difference is there between gdscript and lua? |
Sharing my findings here: Godot version: Godot Mono v3.2.3 and Godot v3.2.3 OS/device including version: Windows 10, GLES3, Geforce 2060 rtx I have this procedural hair generated using ImmediateGeometry: Since math is done in c++ I figured the issue was probably related to how gdscript handles loops or arrays and might be worth reporting. Minimal reproduction project: contains a C# scene and GdScript scene |
It's difficult to speculate exactly why any specific C# program is significantly faster than the GDScript version. First, mono jit is likely applying several optimizations to the structure of the compiled bytecode, such as dead code elimination, common subexpression elimination, loop unrolling, etc. I also expect a significant difference comes from the core types being implemented in mono for the mono module. Having mono implementations of those core types enables visibility into the methods by the optimizer, and, as they are not virtual, the optimizer may be able to inline some of the calls. The GDScript runtime has to dynamically dispatch those calls on core types, which is significantly slower. Inlining calls also allows the optimizer to include the instructions of the inlined method in its optimization of the caller, providing more opportunities to eliminate common subexpressions or reorganize the code for better instruction scheduling. |
That is actually a general misconception made up by C++ programmers who never ever seen an Assembly code. Assembly code can be as well optimized as the coder knowledge and experience allows it to. So a good Assembly programmer can make a lot better code than a C++ any time. However, a beginner will not be able to optimize their code too much. Unlike a beginner C++ programmer who just pass a command line argument to get a fair output. Now as for commenting of the experienced Assembly programmer, it still takes time to manually optimize a code and sometimes that makes the code quite unmaintainable at the end. Once I wrote a DSP filter to a PIC controller, and it took me a good 3 month just to optimize it and to fit into the chip. As a result I was able to use a much smaller chip than the competition meanwhile filtering 2 channels at the same time instead of just 1. Simply because I was using Assembly and not C. Moral is that writing something in Assembly can give you the ultimate performance and smallest code, but might not worth the effort of it. And it depends on the project again, so if you can save just $1 per circuit by using a smaller chip, you can save a million dollar in a million unit. But in a Game development it might not justify the amount of work you push into it. |
Certainly a developer with a deep knowledge of the problem, the specific
hardware, and the specific instruction set can write as good or better code
than any optimizing compiler. The problem comes in when writing code that
should run on other hardware. Even if the program is compatible with a
different CPU, there are few guarantees that it will be optimal.
…On Sat, May 29, 2021 at 8:21 PM Tamas Rudnai ***@***.***> wrote:
A bit off topic, but... due to the sheer level of sophistication of
compilers, it is extremely likely that C++ will outperform handwritten
assembly. For trivial cases, the compiler has those cases optimized. For
non-trivial cases, the problems are likely too complex for a human to
hand-optimize the assembly.
That is actually a general misconception made up by C++ programmers who
never ever seen an Assembly code. Assembly code can be as well optimized as
the coder knowledge and experience allows it to. So a good Assembly
programmer can make a lot better code than a C++ any time. However, a
beginner will not be able to optimize their code too much. Unlike a
beginner C++ programmer who just pass a command line argument to get a fair
output.
Now as for commenting of the experienced Assembly programmer, it still
takes time to manually optimize a code and sometimes that makes the code
quite unmaintainable at the end. Once I wrote a DSP filter to a PIC
controller, and it took me a good 3 month just to optimize it and to fit
into the chip. As a result I was able to use a much smaller chip than the
competition meanwhile filtering 2 channels at the same time instead of just
1. Simply because I was using Assembly and not C.
Moral is that writing something in Assembly can give you the ultimate
performance and smallest code, but might not worth the effort of it. And it
depends on the project again, so if you can save just $1 per circuit by
using a smaller chip, you can save a million dollar in a million unit. But
in a Game development it might not justify the amount of work you push into
it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36060 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAUFIAC6EFRPCJ2BOFEJ67LTQGHJZANCNFSM4KSEI67A>
.
|
@vnen I don't think this issue is relevant after the GDScript rewrite, the benchmarks need to be redone |
The information is interesting to have for history's sake and it does need to be re-run from time to time, but I don't think an open issue is the best place to keep this information (given there's nothing actionable to one day close the issue). I don't know what would be the best place though. |
How is performance comparably after the rewrite? |
I gave some general numbers about improvement in GDScript itself a while ago: https://godotengine.org/article/gdscript-progress-report-typed-instructions I haven't compared it with other languages and don't plan to for now. Focus on performance will happen after Godot 4.0 is released. I'm closing the issue now as I don't think there's anything that could be done here. Sure, GDScript could have better performance, but it doesn't really need to be on par with C#. |
Godot export templates:
https://downloads.tuxfamily.org/godotengine/3.2/mono/Godot_v3.2-stable_mono_export_templates.tpz
Projects:
GDScript takes 10x longer than C# to execute. Results for rendering the first 1000 frames are (in milliseconds):
Intel Core i5 650 @ 3210 MHz, GTX 950
How to benchmark:
--
The purpose of this issue is to keep a track on the performance in case there are any optimizations implemented in the future.
The text was updated successfully, but these errors were encountered: