-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052
Comments
I wonder if M1 macs implement transform feedback on the CPU. That would explain the CPU and GPU times spiking so high together. After a quick google search it looks like that may indeed be the case. I have found a few posts claiming that transform feedback and geometry shaders are implemented on the CPU. Unfortunately this may mean that CPUParticles are the only viable option for particles on M1 macs (in 3.x that is, in 4.0 the Vulkan/Metal renderer may work much better) |
Would that suggest I should see similar performance with either particle type? Because that's not the case. I'm seeing that a single Particles2D can handle around 10X as many particles compared to the equivalent CPUParticles2D. |
If I am right above, it would mean that CPUParticles may even be faster
Ah, so I guess my guess is wrong. Are CPUParticles also way slower on your M1 than on your MacBook pro? |
Can you reproduce this after disabling batching in the Project Settings? Also try playing with the buffer orphaning project setting. |
Yep, just tried (batching off, orphan buffers off, both off). Still around 10 fps in any case. |
No, CPUParticles seem capable enough on either machine. Here are all 4 numbers, for clarity: MacBook Pro 2015 (integrated graphics) M1 Mac Mini So, it looks like Particles2D is actually subpar on both Macs, but especially bad on M1. I also noticed that the version using Particles2D takes noticeably longer to even start up - I get the spinning beach ball of death on the splash screen for a few seconds on the MacBook and up to 20 seconds on M1. Whereas the CPUParticles2D scene starts running pretty much instantly. |
Having same issue here on M1 with Godot 3.4 stable. Also tried to export as release but still laggy. |
I am fairly certain this is caused by Apples poor support of the OpenGL standard. Specifically, I don't think Apple devices support using transform feedback on the GPU, so the drivers emulate it by passing data back to the CPU. If that is the case then the solution is to run OpenGL over Metal on Apple devices. There is a draft PR already: #50253 |
Apple joins Blender Dev team recently. If they really want to promote their apple silicon, helping game developers like Godot engine is crucial too. Anyway, I'll use CPUParticle2D instead on macOS for now. Thanks :) |
I wonder if we should expose the "Convert to CPUParticles" option to be used at run-time, and run it automatically on all GPU-based particle nodes by default on macOS (unless they use a custom shader). This option works fairly well for simple particle setups, and it still allows you to use GPU-based particles on other platforms. PS: If any of you have an iOS device, does this slowdown also apply on iOS? |
Closed by #55268. |
Maybe I'm just missing some information, but I'm not quite convinced that we've found the real issue here. Even on my M1 Mac, as I mentioned previously, a single Particles2D is still far more capable than a single CPUParticles2D. It can do 1 million particles at almost 60 fps. It's only when I instance numerous Particles2D nodes that the performance degrades. And the number of nodes has much more of an impact on the frame rate than the total number of particles. The fact that only 100 Particles2D nodes, emitting one particle each, is taking the fps down to 39... just seems very odd. Is there an explanation? |
Yes, same explanation as above. TransformFeedback is implemented on the CPU in apple's OpenGL driver, so to update each Particles2D node the entire GPU process stalls while the particle data is passed from GPU to CPU and back to GPU. It is more efficient to transfer a million particles once than it is send a single particle a thousand times. That is why things like batching can be effective. It is more efficient to send data to and from the GPU in big batches than it is to send a thousand tiny commands. |
I see, so sending from GPU to CPU and back is just that slow... and CPUParticles is much faster because information is only being sent one way (to GPU)? If batching is a potential solution, how can I make sure that happens? I'm setting all the instances to the same ParticlesMaterial, no texture/material and tiny visibility rect (is this used for overlap test?), but it isn't batching:
|
That is my understanding, yes. I don't know much about how Transform Feedback is implemented on Apple devices (other than that it is in software) so there could be some other things that the driver does that explain the poor performance.
Particles are not batched. If you only have a single particle in each Particles2D then it will be much more efficient to use a Sprite instead. For most hardware, Particles only become worthwhile once you start having hundreds to thousands of instances within a Particles2D (hundreds on lower end hardware, thousands on higher end hardware). My guess is that 1000 Sprites will outperform 1000 Particles2D. You have three alternatives to try out:
I can't say for sure what will perform the best for you. But it is worth trying a few different approaches to see what works best for your workflow and performance goals. |
@clayjohn Thanks, good to know! As of now I'm using VisualServer for particles as well as several thousand stars, and it's running like a dream :) |
@akien-mga I'm not sure this is a good solution to this issue. Please correct me if I'm wrong, but you choose to solve "particles are working slow on M1" by having "if the developer happens to work on M1, warn them about that". It doesn't really fix anything for multi-platform games, which are typically not developed on the target platform. Adding the notice when you export the game would at least notify the developer of the problem, but you cannot reasonably expect developer to re-make entire particle system in their game, which might be hundreds of nodes across hundreds of scenes, just for sake of targeting additional platform - which will additionally decrease the performance for the unaffected platforms. A proper solution would be to offer conversion from Particles2D to CPUParticles2D either at runtime or at export time. |
Well there doesn't seem to be a good solution to that issue indeed, at least until a macOS focused rendering contributor decides to look into what workaround could be implemented for that platform. A conversion to CPUParticles2D sounds interesting but AFAIK the features don't map 1:1 so I'm not sure how well this would work. Anyways reopening for further discussion. |
I think at the very least it should present a warning when you target a OSX export regardless of the platform you are developing on. This would make the developer aware of the problem and they can work around it. I intend to solve that by making a container to sibling |
I think we should add a project setting that automatically converts GPUParticles3D to CPUParticles3D at run-time on macOS by default. Most built-in particles should be able to convert with similar visuals, but for custom particle shaders, it's better to have broken particles than unplayable performance when the project runs on macOS. The editor should also warn you before assigning custom particle shaders on macOS, as they can't be converted to CPUParticles. |
This is happening to me today on M2 macbook pro running Sonoma |
Which Godot version and rendering method are you using? |
Godot version
3.3.2 stable, 3.4 RC1
System information
macOS Big Sur
GLES3
Issue description
I was working in my main project when I noticed some frame drops when I added more than about 20 Particles2D nodes at the same time. This seemed unreasonably slow, so I made a test scene in a new project that instances 500 basic particle systems with only 1 particle each. On my 2020 M1 Mac Mini this scene runs consistently at a ridiculous 10 FPS. I tested this in 3.3.2 stable as well as 3.4 RC1, same results on both.
For comparison, running this same scene in 3.3.2 on my 2015 MacBook Pro gets a solid 30 FPS, which seems reasonable for an older laptop with integrated graphics. Also, some fellow devs testing the same code on PC and Linux had no issues. So it seems this may be an issue specific to M1 machines.
Steps to reproduce
run the test project, presumably on an M1 Mac
Minimal reproduction project
particle performance test.zip
The text was updated successfully, but these errors were encountered: