-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with Intel surfaces rendered with Intel GPU to Nvidia output #211
Comments
Also seeing bad performance with Nvidia surfaces on an Nvidia output, strangely. Even when modifying the compositor to only use the Nvida GPU. It blocks on Performance seem to vary in OpenMW depending on what the camera is pointed at, but not just based on how much of the screen in changing. So maybe render calls are slow on the client for some reason? (So the impact depends on what it's rendering.) I don't think another GPU or the CPU should be accessing the buffer in this case. Using 8 bit color instead of 10 bit doesn't help. It is using an Nvidia modifier, though I don't know if it's the most optimal Nvidia specific modifier. But presumably Nvidia's egl-wayland would choose a reasonable one out of those offered... |
Per my comment in #281 above, this seems related to P states. The P state in both 535 and 545 are all over the show. When the P state is forced to P0 everything becomes smooth. A test I have done is running vkcube, then drag by the titlebar (whatever distance) and don't let go after movement stops - the animation becomes super smooth in both nvidia-open, nvidia-proprietary, 535, and 545. For me it seems to flutter between P4 and P5 unless forced. And apparently some games are impacted also. This affects only the nvidia outputs, I guess the buffer slinging needs much more speed compared to when the internal display is used. A cursory sidenote: I think the internal screen is affected by the same issue if the external out is connected. |
Hmm, kernel 6.7.1 seems to fix the issues I was having with nvidia p-states. However the click/drag issue still applies. Edit: false alarm. P-state flutters between P0, P4, P5. Causes stuttering. |
6.7.1 on all 3 of my asus rog machines, one seems to have no p-state issue, while the others do. Same install etc.. not sure what's going on. I'm about to test the 550 nvidia driver and will report back |
I'm bewildered. So I might be confusing the issue a bit. As stated above the click-drag-hold with glxgear/vkcube still exists with 550 nvidia driver. Almost all games run very very well for me. With the exception of Quake re-release which runs like a bucket of dried poo at higher res. Quake II re-release appears to run super well. Cyberpunk also seems much improved but requires v-sync for smoothness. Hmm.. cosmic-comp is excellent. The improvement over gnome is honestly a bit absurd. The difference between 545 and 550 driver... I can't really tell, cosmic does feel much better but that could well be a placebo effect for me. I tested with a ROG Strix dgpu ouputs, and a ROG X16 plus EGPU (XG Mobile) outputs. |
Ah. So that would be related to the heuristic cosmic-comp tries to use to device which GPU to composite on. (Which could probably be improved.) The Wayland protocol for GPU buffers also probably needs some improvement to better handle multiple GPUs. (Like https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/268) Anyway, you should see it work better if the client is started with |
The linked protocol extension looks pretty important for hybrid. But my understanding of how nvidia (open and proprietary) work - don't they work slightly different to usual or did that all change once nvidia finally supported gbm? I'm a bit behind the times on this. The drag thing seems to be no-longer relevant. As for games.. Cyberpunk is stuttery without v-sync, not sure where to go with that. Since it's in Xwayland it's a bit different. Running Edit: additional info:
Quake II RTX with switching renderers between RTX and openGL is a good test case. |
Yep, that's changed. Nvidia eventually decided to stop persuing EGLStreams for Wayland, and now supports GBM (unless your card is is too old to be supported by the current drivers, like GT 700 cards). They are now using the same dmabuf protocol that Mesa uses. That part of the driver is in https://github.com/NVIDIA/egl-wayland. So if the dmabuf protocol needs improvement, hopefully Mesa and Nvidia drivers will both make use of it. |
For some things this does indeed have a very heavy impact. But it also seems that things running in xwayland are not so impacted by it. A quick test for me was using wgpu examples - For glxgears and vkcube they run at full refresh but are still stuttering occasionally. I didn't try vkcube-wayland as that locked up cosmic last time I tried. I just installed KDE6 also for testing and this is quite a stark difference. KDE-5 was terrible, but KDE-6 is like putting on a clean pair of silk underpants, nvidia outputs are very smooth, and so are games that run under the refresh rate. I'm unsure if kde-6 is forcing a vsync however. But this smoothness is for all the examples above. Edit: the key difference so far is that under kde-6 the p-state for nvidia is always P0. On other desktops (cosmic, kde5, gnome) it fluctuates P0/P4/P5. And every time the P state changes it gets a jank/stutter. |
This can't have an effect on Xwayland apps, as it affects the wayland connection, which X clients no nothing about. So until we run one Xwayland instance per gpu, using the PRIME-env variables is the best you can do for Xwayland apps. (Or running them through another rootful Xwayland instance or gamescope launched with our custom socket, which also circumvents the issue.) The goal is ultimately to do just that and set the appropriate environment variables accordingly through our launcher and other means of starting applications on cosmic. |
@Drakulix I apologize for not posting in quite the right issue, but in the interests of keeping data points in at least a reasonable place with other points.. something I did notice:
this runs very well if there are no xwayland windows on the screen. And a side effect is that all cosmic windows disappear while the example is running - other wayland windows do not. Frame timings go from (no env vars):
to
If the window thing is of importance I can create an issue for it Hmm.. running steam increases frame times and is also hidden when runnign the env var command example. |
It seems good to document what I've seen with performance somewhere.
I've been seeing about 20fps with
es2gears
on a 1440p monitor. On a 1650 mobile with the 545.23.06 driver.Looking at Tracy profiling, using Smithay/smithay#1134 with some changes:
Context 0 in this case is the Nvidia GPU. The portion of
render
beforeclear
seems to be the wait at https://github.com/Smithay/smithay/blob/4f9480e5e02d05379e14ec49e1135cc8a57275d1/src/backend/renderer/multigpu/mod.rs#L1144. Without that performance is improved but still less that 60fps (and there's artifacting, of course).Both GPUs spend about 10ms here drawing in
render_texture_from_to
.With Intel rendering to an Intel target (but just at 1080p), it only takes about 967μs. Forcing a linear modifier raises that to around 2-5ms, seemingly with more variation frame to frame. I guess that's just (multi-level) caching? So rendering to a linear modifier is part of the performance difference here, though presumably that should be fine, and normal for reverse PRIME like this...
It is strange though to see
draw_solid
inGlesFrame::clear
taking 3.85ms. That should be fast and just involve the Nvidia framebuffer, which should be efficient to render to...The text was updated successfully, but these errors were encountered: