GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

djrain · 2021-10-21T00:20:50Z

Godot version

3.3.2 stable, 3.4 RC1

System information

macOS Big Sur
GLES3

Issue description

I was working in my main project when I noticed some frame drops when I added more than about 20 Particles2D nodes at the same time. This seemed unreasonably slow, so I made a test scene in a new project that instances 500 basic particle systems with only 1 particle each. On my 2020 M1 Mac Mini this scene runs consistently at a ridiculous 10 FPS. I tested this in 3.3.2 stable as well as 3.4 RC1, same results on both.

For comparison, running this same scene in 3.3.2 on my 2015 MacBook Pro gets a solid 30 FPS, which seems reasonable for an older laptop with integrated graphics. Also, some fellow devs testing the same code on PC and Linux had no issues. So it seems this may be an issue specific to M1 machines.

Steps to reproduce

run the test project, presumably on an M1 Mac

Minimal reproduction project

particle performance test.zip

floppyhammer · 2021-10-21T02:03:01Z

Got similar result on the same machine.

clayjohn · 2021-10-21T02:16:41Z

I wonder if M1 macs implement transform feedback on the CPU. That would explain the CPU and GPU times spiking so high together.

After a quick google search it looks like that may indeed be the case. I have found a few posts claiming that transform feedback and geometry shaders are implemented on the CPU.

Unfortunately this may mean that CPUParticles are the only viable option for particles on M1 macs (in 3.x that is, in 4.0 the Vulkan/Metal renderer may work much better)

djrain · 2021-10-21T03:15:32Z

I wonder if M1 macs implement transform feedback on the CPU. That would explain the CPU and GPU times spiking so high together.

Would that suggest I should see similar performance with either particle type? Because that's not the case. I'm seeing that a single Particles2D can handle around 10X as many particles compared to the equivalent CPUParticles2D.

clayjohn · 2021-10-21T06:37:11Z

Would that suggest I should see similar performance with either particle type

If I am right above, it would mean that CPUParticles may even be faster

Because that's not the case. I'm seeing that a single Particles2D can handle around 10X as many particles compared to the equivalent CPUParticles2D.

Ah, so I guess my guess is wrong. Are CPUParticles also way slower on your M1 than on your MacBook pro?

Calinou · 2021-10-21T14:42:39Z

Can you reproduce this after disabling batching in the Project Settings? Also try playing with the buffer orphaning project setting.

djrain · 2021-10-21T15:02:13Z

Can you reproduce this after disabling batching in the Project Settings? Also try playing with the buffer orphaning project setting.

Yep, just tried (batching off, orphan buffers off, both off). Still around 10 fps in any case.

djrain · 2021-10-21T15:32:17Z

Are CPUParticles also way slower on your M1 than on your MacBook pro?

No, CPUParticles seem capable enough on either machine.

Here are all 4 numbers, for clarity:

MacBook Pro 2015 (integrated graphics)
1000 CPUParticles2D: 60 fps
1000 Particles2D: 13 fps

M1 Mac Mini
1000 CPUParticles2D: 60 fps
1000 Particles2D: 2 fps

So, it looks like Particles2D is actually subpar on both Macs, but especially bad on M1. I also noticed that the version using Particles2D takes noticeably longer to even start up - I get the spinning beach ball of death on the splash screen for a few seconds on the MacBook and up to 20 seconds on M1. Whereas the CPUParticles2D scene starts running pretty much instantly.

Richard74Huang · 2021-11-23T02:49:27Z

Having same issue here on M1 with Godot 3.4 stable. Also tried to export as release but still laggy.

clayjohn · 2021-11-23T03:49:43Z

I am fairly certain this is caused by Apples poor support of the OpenGL standard. Specifically, I don't think Apple devices support using transform feedback on the GPU, so the drivers emulate it by passing data back to the CPU. If that is the case then the solution is to run OpenGL over Metal on Apple devices. There is a draft PR already: #50253

Richard74Huang · 2021-11-23T06:46:47Z

Apple joins Blender Dev team recently. If they really want to promote their apple silicon, helping game developers like Godot engine is crucial too.

Anyway, I'll use CPUParticle2D instead on macOS for now. Thanks :)

Calinou · 2021-11-23T17:05:12Z

I wonder if we should expose the "Convert to CPUParticles" option to be used at run-time, and run it automatically on all GPU-based particle nodes by default on macOS (unless they use a custom shader). This option works fairly well for simple particle setups, and it still allows you to use GPU-based particles on other platforms.

PS: If any of you have an iOS device, does this slowdown also apply on iOS?

akien-mga · 2022-01-12T15:52:05Z

Closed by #55268.

djrain · 2022-01-13T07:39:01Z

Maybe I'm just missing some information, but I'm not quite convinced that we've found the real issue here.

Even on my M1 Mac, as I mentioned previously, a single Particles2D is still far more capable than a single CPUParticles2D. It can do 1 million particles at almost 60 fps. It's only when I instance numerous Particles2D nodes that the performance degrades. And the number of nodes has much more of an impact on the frame rate than the total number of particles. The fact that only 100 Particles2D nodes, emitting one particle each, is taking the fps down to 39... just seems very odd. Is there an explanation?

clayjohn · 2022-01-13T08:13:10Z

Yes, same explanation as above. TransformFeedback is implemented on the CPU in apple's OpenGL driver, so to update each Particles2D node the entire GPU process stalls while the particle data is passed from GPU to CPU and back to GPU.

It is more efficient to transfer a million particles once than it is send a single particle a thousand times. That is why things like batching can be effective. It is more efficient to send data to and from the GPU in big batches than it is to send a thousand tiny commands.

djrain · 2022-01-13T16:36:27Z

I see, so sending from GPU to CPU and back is just that slow... and CPUParticles is much faster because information is only being sent one way (to GPU)?

If batching is a potential solution, how can I make sure that happens? I'm setting all the instances to the same ParticlesMaterial, no texture/material and tiny visibility rect (is this used for overlap test?), but it isn't batching:

items
	joined_item 1 refs, 
			batch D 0-1 PA 
	joined_item 1 refs, 
			batch D 0-1 PA 
	joined_item 1 refs, 
			batch D 0-1 PA

clayjohn · 2022-01-13T16:57:48Z

I see, so sending from GPU to CPU and back is just that slow... and CPUParticles is much faster because information is only being sent one way (to GPU)?

That is my understanding, yes. I don't know much about how Transform Feedback is implemented on Apple devices (other than that it is in software) so there could be some other things that the driver does that explain the poor performance.

If batching is a potential solution, how can I make sure that happens? I'm setting all the instances to the same ParticlesMaterial, no texture/material and tiny visibility rect (is this used for overlap test?), but it isn't batching:

Particles are not batched. If you only have a single particle in each Particles2D then it will be much more efficient to use a Sprite instead. For most hardware, Particles only become worthwhile once you start having hundreds to thousands of instances within a Particles2D (hundreds on lower end hardware, thousands on higher end hardware).

My guess is that 1000 Sprites will outperform 1000 Particles2D.

You have three alternatives to try out:

Use fewer Particles2D nodes and increase the number of particles in each,
Use CPUParticles where possible (also with a high number of particles in each),
Use regular Sprites (if sharing a material and texture these should batch automatically).

I can't say for sure what will perform the best for you. But it is worth trying a few different approaches to see what works best for your workflow and performance goals.

djrain · 2022-01-13T17:17:14Z

@clayjohn Thanks, good to know! As of now I'm using VisualServer for particles as well as several thousand stars, and it's running like a dream :)

lekoder · 2022-08-08T17:47:25Z

@akien-mga I'm not sure this is a good solution to this issue. Please correct me if I'm wrong, but you choose to solve "particles are working slow on M1" by having "if the developer happens to work on M1, warn them about that".

It doesn't really fix anything for multi-platform games, which are typically not developed on the target platform. Adding the notice when you export the game would at least notify the developer of the problem, but you cannot reasonably expect developer to re-make entire particle system in their game, which might be hundreds of nodes across hundreds of scenes, just for sake of targeting additional platform - which will additionally decrease the performance for the unaffected platforms.

A proper solution would be to offer conversion from Particles2D to CPUParticles2D either at runtime or at export time.

akien-mga · 2022-08-08T18:56:51Z

Well there doesn't seem to be a good solution to that issue indeed, at least until a macOS focused rendering contributor decides to look into what workaround could be implemented for that platform.

A conversion to CPUParticles2D sounds interesting but AFAIK the features don't map 1:1 so I'm not sure how well this would work. Anyways reopening for further discussion.

lekoder · 2022-08-09T08:05:43Z

I think at the very least it should present a warning when you target a OSX export regardless of the platform you are developing on. This would make the developer aware of the problem and they can work around it.

I intend to solve that by making a container to sibling Particles2D and CPUParticles2D placeholders, with a unified API exposed by the container, but probably the ideal solution would be to get a full compatibility of CPUParticles2D and Particles2D ; that would allow both the ability convert them on export or perhaps a run-time option to switch between them.

Calinou · 2022-08-09T09:20:23Z

I think we should add a project setting that automatically converts GPUParticles3D to CPUParticles3D at run-time on macOS by default. Most built-in particles should be able to convert with similar visuals, but for custom particle shaders, it's better to have broken particles than unplayable performance when the project runs on macOS.

The editor should also warn you before assigning custom particle shaders on macOS, as they can't be converted to CPUParticles.

GeraldineSullivan · 2023-12-19T22:17:00Z

This is happening to me today on M2 macbook pro running Sonoma

Calinou · 2023-12-20T14:10:31Z

This is happening to me today on M2 macbook pro running Sonoma

Which Godot version and rendering method are you using?

GeraldineSullivan · 2023-12-20T15:25:52Z

This is happening to me today on M2 macbook pro running Sonoma

Which Godot version and rendering method are you using?

I am using 4.1.2

I fixed it by using the renderer settings from a file posted in the comments. They were all set to gl_compatibility by default.

djrain changed the title ~~Particles2D running incredibly slow on macOS~~ Particles2D running incredibly slow on M1 Mac Oct 21, 2021

clayjohn added discussion performance platform:macos labels Oct 21, 2021

Chaosus added bug topic:rendering discussion platform:macos and removed discussion platform:macos bug topic:rendering labels Oct 21, 2021

Calinou added topic:2d topic:rendering labels Oct 21, 2021

Calinou added the confirmed label Nov 23, 2021

Calinou changed the title ~~Particles2D running incredibly slow on M1 Mac~~ GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU Nov 23, 2021

Calinou added documentation and removed discussion labels Nov 23, 2021

Calinou mentioned this issue Nov 23, 2021

Warn when using GPU-based particles on macOS due to low performance #55268

Merged

clayjohn mentioned this issue Dec 18, 2021

Particles2D jitter / stutter on M1 mac #56060

Closed

akien-mga added this to the 3.5 milestone Jan 5, 2022

akien-mga closed this as completed Jan 12, 2022

akien-mga reopened this Aug 8, 2022

akien-mga modified the milestones: 3.5, 3.x Aug 8, 2022

mhilbrunner added the topic:particles label Aug 19, 2022

Calinou mentioned this issue Oct 23, 2022

Add a menu option to convert CPUParticles back to GPUParticles godotengine/godot-proposals#2997

Closed

Calinou mentioned this issue Jun 1, 2023

Creating or loading GPUParticles nodes crashes engine on macOS with Compatibility GLES3 renderer (Apple Silicon) #72469

Closed

Calinou mentioned this issue Jun 14, 2023

Add a Trail3D node to draw smooth lines for VFX godotengine/godot-proposals#7082

Open

Calinou mentioned this issue Aug 17, 2023

Deprecate CPUParticles godotengine/godot-proposals#7517

Open

bruvzg mentioned this issue Oct 3, 2023

OSX/M1 window resizing crashes the engine when rendering thread model is set to multi-threaded #81402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

djrain commented Oct 21, 2021 •

edited

Loading

floppyhammer commented Oct 21, 2021

clayjohn commented Oct 21, 2021

djrain commented Oct 21, 2021

clayjohn commented Oct 21, 2021

Calinou commented Oct 21, 2021 •

edited

Loading

djrain commented Oct 21, 2021 •

edited

Loading

djrain commented Oct 21, 2021

Richard74Huang commented Nov 23, 2021

clayjohn commented Nov 23, 2021

Richard74Huang commented Nov 23, 2021

Calinou commented Nov 23, 2021 •

edited

Loading

akien-mga commented Jan 12, 2022

djrain commented Jan 13, 2022 •

edited

Loading

clayjohn commented Jan 13, 2022

djrain commented Jan 13, 2022 •

edited

Loading

clayjohn commented Jan 13, 2022

djrain commented Jan 13, 2022

lekoder commented Aug 8, 2022

akien-mga commented Aug 8, 2022

lekoder commented Aug 9, 2022

Calinou commented Aug 9, 2022 •

edited

Loading

GeraldineSullivan commented Dec 19, 2023

Calinou commented Dec 20, 2023

GeraldineSullivan commented Dec 20, 2023 •

edited

Loading

GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

Comments

djrain commented Oct 21, 2021 • edited Loading

Godot version

System information

Issue description

Steps to reproduce

Minimal reproduction project

floppyhammer commented Oct 21, 2021

clayjohn commented Oct 21, 2021

djrain commented Oct 21, 2021

clayjohn commented Oct 21, 2021

Calinou commented Oct 21, 2021 • edited Loading

djrain commented Oct 21, 2021 • edited Loading

djrain commented Oct 21, 2021

Richard74Huang commented Nov 23, 2021

clayjohn commented Nov 23, 2021

Richard74Huang commented Nov 23, 2021

Calinou commented Nov 23, 2021 • edited Loading

akien-mga commented Jan 12, 2022

djrain commented Jan 13, 2022 • edited Loading

clayjohn commented Jan 13, 2022

djrain commented Jan 13, 2022 • edited Loading

clayjohn commented Jan 13, 2022

djrain commented Jan 13, 2022

lekoder commented Aug 8, 2022

akien-mga commented Aug 8, 2022

lekoder commented Aug 9, 2022

Calinou commented Aug 9, 2022 • edited Loading

GeraldineSullivan commented Dec 19, 2023

Calinou commented Dec 20, 2023

GeraldineSullivan commented Dec 20, 2023 • edited Loading

djrain commented Oct 21, 2021 •

edited

Loading

Calinou commented Oct 21, 2021 •

edited

Loading

djrain commented Oct 21, 2021 •

edited

Loading

Calinou commented Nov 23, 2021 •

edited

Loading

djrain commented Jan 13, 2022 •

edited

Loading

djrain commented Jan 13, 2022 •

edited

Loading

Calinou commented Aug 9, 2022 •

edited

Loading

GeraldineSullivan commented Dec 20, 2023 •

edited

Loading