Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive performance drop in DX9 under very specific circumstances with an Nvidia GPU and certain driver settings #16332

Closed
5 tasks done
OCRBonk opened this issue Nov 5, 2022 · 22 comments
Labels
D3D9 Direct3D 9
Milestone

Comments

@OCRBonk
Copy link

OCRBonk commented Nov 5, 2022

Game or games this happens in

Metal Gear Solid Portable Ops, Outrun2, Gundam Battle Universe, Final Fantasy Tactics

What area of the game

Hi this is a bit of a niche use issue. When using compatibility flags to force SGSSAA in Nvidia Drivers in DX9 after build [v1.13.2-989-g7c2b4b60a] seeming to start at [v1.13.2-997-g143be816c] (As that is the next available build there is a general performance loss in most games however in several games there is a catastrophic loss of performance to unplayable levels where previously it ran at full speed at less than 20% GPU usage on an RTX 3080Ti.

Looking at these commits these seem to be some accuracy improvements, unfortunately it seems to cause an issue with this specific use case in DX9. I don't know if it can be worked around or not. If not that's ok.

But here is an archive https://drive.google.com/file/d/1ZUXuI4pCZCiUM65s8c7oj26bhtSgkrFU/view?usp=sharing
with several screenshots showing a performance overlay in each build of the mentioned games, debug logs and Frame Dumps. As well as a file PPSSPP.NIP to use with Nvidia Profile Inspector to import the exact settings being used as driver overrides.
Here's a quick comparison with one of them though for ease of viewing
https://www.screenshotcomparison.com/comparison/29276
I've tried just about every setting to see if it improves the situation and turning off hardware skinning in some titles can improve performance a little bit, but not anywhere near to what it was before.

Speed seen in PPSSPP

No response

GE frame capture and debug statistics

No response

Platform

Windows

Mobile phone model or graphics card

RTX 3080Ti

PPSSPP version affected

v1.13.2-1201-ga76dcf0e7

Last working version

v1.13.2-989-g7c2b4b60a

Graphics backend (3D API)

Direct3D 9

Any other notes or things you've tried

No response

Checklist

  • Test in the latest git build in case it's already fixed.
  • Search for other reports of the same issue.
  • Try resetting settings or older versions and include if the issue is related.
  • Try changing graphics settings to determine if one causes the slowdown.
  • Include logs or screenshots of issue.
@hrydgard
Copy link
Owner

hrydgard commented Nov 5, 2022

I'm guessing the reason you're on DX9 is the ability to force antialiasing?

I intend to add proper MSAA support to the Vulkan backend in the future, should reduce the need. Still, it would of course be nice to fix this regression.

You're saying it's broken at 1201 ( #16098 ). Regarding 936, I'm not sure I understand, is that just some earlier build where it worked? Any chance you could narrow it down a bit further?

Regarding skinning, it would be very unlikely to affect performance like this.

@unknownbrackets
Copy link
Collaborator

It would help if you can try multiple git builds and make sure to pinpoint the very LATEST build it was still working in, and the very EARLIEST build it broke in. The current range is a bit wide.

Slightly worried maybe my INTZ change caused this? But the range would be different.

-[Unknown]

@hrydgard hrydgard added the D3D9 Direct3D 9 label Nov 5, 2022
@hrydgard hrydgard added this to the v1.14.0 milestone Nov 5, 2022
@hrydgard
Copy link
Owner

hrydgard commented Nov 5, 2022

By the way, for ease of testing, can you explain exactly how you are forcing on this AA mode, so I can try exactly the same? My NV GPU is different, but the problem might be shared.

@OCRBonk
Copy link
Author

OCRBonk commented Nov 8, 2022

Sorry for the delayed response. Been busy. I'll update my post here in a bit to explain.

I'm guessing the reason you're on DX9 is the ability to force antialiasing?

I intend to add proper MSAA support to the Vulkan backend in the future, should reduce the need. Still, it would of course be nice to fix this regression.

You're saying it's broken at 1201 ( #16098 ). Regarding 936, I'm not sure I understand, is that just some earlier build where it worked? Any chance you could narrow it down a bit further?

Regarding skinning, it would be very unlikely to affect performance like this.

Yes the reason I use DX9 when possible is because of Nvidia's Sparse Grid Super Sampling that has been part of their DX9 drivers for about a decade now. If you are curious as to what it is exactly, you can actually refer to this from AMD
https://github.com/GPUOpen-LibrariesAndSDKs/SSAA11
It's basically the same as that. It uses the MSAA subsamples for replaying pixel shading N times number of subsamples in addition to the normal Triangle shading of MSAA. It effectively ends up applying Anti Aliasing to everything far more effectively than typical super sampling thanks to the sample pattern of MSAA(The weakness really ends up being the standard MSAA resolve.) being more dispersed and not just an ordered grid pattern of sampling.
If you download the archive https://drive.google.com/file/d/1ZUXuI4pCZCiUM65s8c7oj26bhtSgkrFU/view?usp=sharing
inside is the .NIP file I mentioned, if you use Nvidia Profile Inspector
and import the .NIP file it will automatically create a driver profile for PPSSPP and set the appropriate compatibility flag to allow SGSSAA to work in DX9 and set it to 8xSGSSAA.

Adding MSAA support would be great! As that could probably fix a lot of aliasing present in the majority of the games. Except where there are instances of things like texture aliasing and alpha texture aliasing that MSAA can't handle. This works great for that.
https://www.screenshotcomparison.com/comparison/29372
Here's some videos:
8xMSAA https://drive.google.com/file/d/1IMO2HzDQenIVg5k1SvKRf4Jv7y2fKU7i/view?usp=sharing
8xSGSSAA https://drive.google.com/file/d/14tiffFSXxBaMNwSgDlFVDMLQXcK6MdbI/view?usp=sharing
I haven't looked at the code, but from what I can tell or recall the "SuperSampling" options in Dolphin actually seem to do the same thing having compared that to using MSAA and enhancing it with SGSSAA in the drivers in OpenGL.

Just having MSAA would be huge however because there are quite a few games in DX9 that just don't render properly and I don't expect you guys to fix it if it's not easy to do due to how old and outdated DX9 is. Being able to get MSAA in more accurate and up to date backends would be great.
(Like this humorous example from Hammerin Hero
PPSSPPWindows64_2022_11_08_01_44_52_327 Builds 1216 and after actually fix this issue, thanks @unknownbrackets ! 👍
PPSSPPWindows64_2022_11_08_01_49_38_178

As for the builds, sorry for such a large disparity. I singled these out based on commits based on changes to DX9. I did try several builds in between both, 1201 seemed to be the build where things broke. I realize now this makes not much sense, so I went back and downloaded all available builds from the bot between the two and found out the last working build was actually v1.13.2-989-g7c2b4b60a
Everything after that is broken. I'm not sure which commit between 989 and 997 breaks things.

For the other comment, i'm sorry I got Software Skinning mixed up with Hardware Transform. This was a negligible performance implication before the issue. But after as you can see it's pretty bad https://www.screenshotcomparison.com/comparison/29373

It would help if you can try multiple git builds and make sure to pinpoint the very LATEST build it was still working in, and the very EARLIEST build it broke in. The current range is a bit wide.

Slightly worried maybe my INTZ change caused this? But the range would be different.

-[Unknown]

No Your INTZ change I don't think caused this but like mentioned above it does fix some rendering issues in DX9

@hrydgard hrydgard modified the milestones: v1.14.0, v1.15.0 Nov 23, 2022
@hrydgard
Copy link
Owner

hrydgard commented Nov 23, 2022

Thanks for your detailed report, sorry I missed to reply before.

That would mean that the culprit is #15944 . There's just very little in there that seems DX9 related..

I guess I'm just gonna have to try and reproduce myself, thanks for the instructions!

@hrydgard hrydgard modified the milestones: v1.15.0, v1.14.0 Nov 23, 2022
@hrydgard
Copy link
Owner

I also looked into this a bit more, and it seems we can get the exact same effect as SGSSAA in Vulkan, by setting sampleShadingEnable = true on pipelines when we enable MSAA:

https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkPipelineMultisampleStateCreateInfo.html
https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#primsrast-sampleshading

We could theoretically even make the fraction of samples shaded configurable by setting minSampleShading.

Will definitely implement that when I add MSAA support. This is going to be better than traditional supersampling due to the better sampling patterns that can be used with MSAA.

@hrydgard
Copy link
Owner

hrydgard commented Nov 28, 2022

Okay, I decided to just give it a go, and managed to get it mostly working in Vulkan in a day:

#16458

Like I said, in Vulkan we have very detailed control over sample shading, so in addition to just regular MSAA and SGSSAA, we can do a middle way where we only shade all samples if alpha testing is enabled (ugly fences and stuff), which looks fantastic. That's what the change does right now, but I'll add some more control over quality later.

@hrydgard
Copy link
Owner

hrydgard commented Nov 29, 2022

I've tried using the NPI and I was able to successfully apply the antialiasing, but even at 10x, I'm not seeing any kind of catastrophic performance loss. So I'm at a loss here...

Best thing is probably to wait for my MSAA implementation for Vulkan to be finished up, it will be able to do full on SGSSAA.

@hrydgard hrydgard modified the milestones: v1.14.0, v1.15.0 Nov 29, 2022
@OCRBonk
Copy link
Author

OCRBonk commented Nov 30, 2022

I thought perhaps it was because I mainly use my W7 boot drive for most gaming related tasks. So I tried my W10 drive. Still had same issue. I also use RTSS Scanline Sync and thought that might have been the issue, so I tried disabling it on both W7 and W10 and the performance actually dropped a bit more (Which is slightly unusual).
I thought maybe RTSS just detecting the EXE could be the issue or my settings. So I deleted my settings and let PPSSPP regenerate it, and then disabled RTSS. Still same issue. Game will run fine , use 10-20% GPU and then as certain things appear on screen jump to near 100% and slow down.
Are you able to confirm with A/B screenshots that SGSSAA was applying correctly? (I'm sorry if that is a redundant or stupid question. Over the years troubleshooting this with people on forums sometimes it's not obvious to some that it's working as intended)

Either way i'm super grateful you are taking your time to look into this at all. Especially finding out you can implement this in Vulkan! Which leaves me beyond thrilled. I think I had seen that sampled shading referenced somewhere when I was researching something a while back but forgot. (Probably because my hopes of ever seeing it used were pretty much zero since everyone is more interested in things like TAA/DLSS these days which I am not the biggest fan of).

And if simply implementing it into Vulkan is the easiest and best option. Then I have no problem with that. :)

@hrydgard
Copy link
Owner

It's very obvious that it was working since edges were smooth and it looked exactly the same as my Vulkan solution - nicely antialiased everything, including alpha tested stuff. I know what to look for at this point :)

Still baffled by the slowdown, but it's probably indeed best to leave DX9 behind now, once I'm done with the Vulkan solution..

@OCRBonk
Copy link
Author

OCRBonk commented Nov 30, 2022

Yeah I know it usually is obvious...sorry for that I am just a bit baffled as to what could have changed that seems to have made my setup out of whack between 989 and 997. There are those six changes in between from 990-996 but I can't test those.

But either way, i'll be extremely happy once the Vulkan solution is ready. Would definitely get rid of the only reason I use DX9 when it works with the game in question. Nvidia uses some kind of shim and those compatibility bits in order to force the AA in. But only functionally built out in DX9, almost a decade ago a lot of us petitioned them on their forums and even had a stickied thread asking for the same support in DX11 but they declined to add it.
Since i'm not a programmer forgive my ignorance in asking, theoretically since Vulkan does have the Sample Shading extension for a program like DXVK is it possible to program similar logic to detect compatibility in some way via flags to look for certain things to shim in MSAA (And Sample Shading on top)AA in a wrapped game?
I know not related to your program, I am mostly just curious if it's even possible.
I remember when deferred rendering and lighting started to take off mid 360 gen and many folks said using MSAA in those setups was impossible but in almost every instance of games from that era that run on DX9 i've been able to find a combination of bits to make MSAA work (Granted in those games the shimmed MSAA often costs almost as much performance as SGSSAA at that point where it usually makes more sense to just use SGSSAA).

Anyway sorry off topic question there, i'm very glad you have found a better overall solution that negates the need for DX9 all together and that's a great outcome. Thank you!

@hrydgard
Copy link
Owner

hrydgard commented Nov 30, 2022

Yes, as long as the game already supports basic MSAA, it's not so hard to force on SGSSAA. Here's the patching that such a DXVK or other Vulkan wrappers would do:

When they create a "graphics pipeline" with vkCreateGraphicsPipelines, in the pMultiSampleState member of VkGraphicsPipelineCreateInfo, if rasterizationSamples > VK_SAMPLE_COUNT_1_BIT, override sampleShadingEnable to true, and minSampleShading to 1.0.

Additionally, when creating the Vulkan device using vkCreateDevice, in the features struct, "sampleRateShading" must be enabled.

That should be it.

If the game doesn't already support multisampling, though, it is much harder. The more detailed control Vulkan gives you, makes it necessary to take multisampling into account manually in many more places, and resolves need to be inserted, and so on.

@hrydgard
Copy link
Owner

hrydgard commented Dec 3, 2022

This is very likely what slowed it down:

https://github.com/hrydgard/ppsspp/pull/15944/files#diff-e0d04f421935c7b00616ded74e888a6c75cfce4cd14fa04da8ff9c51d0cd2de5R102

Found by mjunix in #16489 .

This is now fixed in the latest master (where Vulkan MSAA is now available), so you'll have the DX9 option again too :)

Can you confirm, so we can close this?

@ghost
Copy link

ghost commented Dec 7, 2022

Checked Gta ctw on quadro fx3500, its fixed.

@hrydgard hrydgard modified the milestones: v1.15.0, v1.14.0 Dec 7, 2022
@hrydgard hrydgard closed this as completed Dec 7, 2022
@hrydgard
Copy link
Owner

hrydgard commented Dec 7, 2022

Thanks for testing!

@OCRBonk
Copy link
Author

OCRBonk commented Dec 11, 2022

Sorry for the late reply! Life is busy. But yes, that merge did fix it!.
Well damn, this couldn't have ended better. Improved AA in Vulkan, and a fix to the performance drop. Just wonderful!

@hrydgard
Copy link
Owner

Good to hear :)

@OCRBonk
Copy link
Author

OCRBonk commented Dec 11, 2022

Any way through the ini to enable SampleShading in Vulkan yet in master or is that still a WIP? Just curious.

@hrydgard
Copy link
Owner

hrydgard commented Dec 11, 2022

It is enabled for transparent textures which are the ones that benefit the most, and we thus save some performance. This will be a "Medium" mode in the future. Set texture filtering to "Auto max quality" to get rid of most of the rest of the aliasing through other means (mip, aniso).

I'll add a quality setting soon to allow enabling it on everything, just haven't decided on which levels to include.

@OCRBonk
Copy link
Author

OCRBonk commented Dec 11, 2022

Ah, I see. It does work and it looks near identical to when forced in DX9. Which is great.
(The ability to set it separately from texture filtering would be preferable as sometimes I prefer the look of NN texture scaling.)

Although I have noticed some differences on things like Alpha Tested Textures:
https://www.screenshotcomparison.com/comparison/30176
Not sure if this is due to miplevels being used (It does seem to vary with resolution scale).

In Star Ocean First Departure, enabling MSAA causes the character sprites to either flicker (With 2x,4x) or disappear at 8x in scenes with pre rendered backgrounds.

As for what levels to include. I'm assuming by default it matches the sample rate of the MSAA used and setting that independently wouldn't be an option? Typically, SGSSAA only looks good when matched to the same sample rate. At least when used from Nvidia drivers.
If it's already used by default for just transparent textures when MSAA enabled, perhaps the simplest option would just be a check box under MSAA to enable Sample Shading when using MSAA. Which would automatically match the level to the number of samples used. (Independent of the Auto max quality setting. Star Ocean is an example I think of a game once upscaled looks more well rounded with NN texture scaling due to the backgrounds which have seams visible with upscaling and texture filtering. But still benefits from AA with NN texture filtering.)

@hrydgard
Copy link
Owner

Gonna have to look into the star ocean problem, I suspect it's similar to whatever is happening in Jeanne D'arc which has a similar issue on newer NVIDIA cards.

We can set the shading rate to a percentage, to shade 50% of samples for example. Don't know why that wouldn't look pretty good, so it could make sense as an option maybe. Though, I don't want to complicate things, so maybe I'll indeed just go with a checkbox. It'll need a name though, maybe "Ultra quality MSAA" or "Shade all samples" or something...

@unknownbrackets
Copy link
Collaborator

The screenshot comparison link is a known issue with auto max quality - see #14986.

Star Ocean - I wonder if the stencil upload is not working out properly?

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D3D9 Direct3D 9
Projects
None yet
Development

No branches or pull requests

3 participants