Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NV low latency support #1739

Conversation

esullivan-nvidia
Copy link
Contributor

This pull request implements a new device interface called ID3DLowLatencyDevice, and ID3D12CommandQueueExt using the VK_NV_low_latency2 extension. The purpose of these interfaces, is to give dxvk-nvapi a way to implement the nvapi Reflex interface.

@oscarbg
Copy link

oscarbg commented Oct 14, 2023

Nice..
hope it initiates the way for DLSS3 FG VKD3D support..

Quick question: seems no Nv driver for Linux currently supporting that low latency 2 extension.. even missing on Nvidia Vulkan dev drivers..
Only one supporting it is 550.09 for Windows Insiders..
seen:
https://vulkan.gpuinfo.org/listdevicescoverage.php?extension=VK_NV_low_latency2&platform=all

so hoping NV can expose before 550.xx branch, ideally both on NV VK dev drivers and also upcoming(?) 545.xx driver series..

@adamnv
Copy link
Contributor

adamnv commented Oct 17, 2023

Support for the low-latency extensions just got released in the 545.23.06 NVIDIA driver so this is usable/testable now. (Propagating this note to all three project PRs, apologies for any duplication! 😸 )

Copy link
Owner

@HansKristian-Work HansKristian-Work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits aside, the synchronization strategy here seems pretty dodgy and I think that needs to be explained in more detail.

@HansKristian-Work
Copy link
Owner

From spec:

If two commands operate on the same object and at least one of the commands declares the object to be externally synchronized, then the caller must guarantee not only that the commands do not execute simultaneously, but also that the two commands are separated by an appropriate memory barrier (if needed).

I think LL2 needs to specify that setting latency markers is explicitly atomic within the implementation to bypass this issue:

From vkWaitForPresentKHR:

As an exception to the normal rules for objects which are externally synchronized, the swapchain passed to vkWaitForPresentKHR may be simultaneously used by other threads in calls to functions other than vkDestroySwapchainKHR. Access to the swapchain data associated with this extension must be atomic within the implementation.

@esullivan-nvidia
Copy link
Contributor Author

Thanks for the review, I will get the feedback addressed later today. I really appreciate it. Let me know if you have any additional thoughts, questions, or concerns for this PR.

@HansKristian-Work
Copy link
Owner

HansKristian-Work commented Oct 26, 2023

In parallel, it would be good if you could update the Vulkan spec to explicitly state the externsync exception. That guarantee will remove a lot of my concerns with the feature.

@esullivan-nvidia
Copy link
Contributor Author

Will do, thanks for pointing that out. I will get that kicked off today as well.

@HansKristian-Work
Copy link
Owner

HansKristian-Work commented Oct 31, 2023

Found another issue, VK_STRUCTURE_TYPE_LATENCY_SURFACE_CAPABILITIES_NV is not queried, which I think is required to use the feature? I tried using this struct in Granite, and the driver seems to completely ignore this struct when I pass it to GetSurfaceCapabilities2KHR.

If it's not required, will we have the same problem as VK_KHR_present_wait all over again w.r.t. wayland vs x11 feature query?

loathingKernel added a commit to loathingKernel/Proton that referenced this pull request Nov 22, 2023
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
loathingKernel added a commit to loathingKernel/Proton that referenced this pull request Nov 22, 2023
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
@ptr1337
Copy link

ptr1337 commented Nov 25, 2023

We are testing the Nvidia Reflex Implementation since a while. It is running quite well.
Benchmarks needs to be done, now issues found yet.

@HansKristian-Work
Copy link
Owner

Is there any progress planned on this PR? There are outstanding concerns from the review that have not been addressed yet, so I assume we're waiting for NV to update the PR.

@esullivan-nvidia
Copy link
Contributor Author

Is there any progress planned on this PR? There are outstanding concerns from the review that have not been addressed yet, so I assume we're waiting for NV to update the PR.

Yes, I am still planning on updating this PR. Sorry for the delay. I had some other tasks that unfortunately took precedent over this work. Those are done now and I should be able to send an update soon.

Jan200101 pushed a commit to Jan200101/NorthstarProton that referenced this pull request Jan 28, 2024
Requires at least Nvidia 545.xx drivers.

The patches are from the following pull requests:
[wine](ValveSoftware/wine#200)
[dxvk-nvapi](jp7677/dxvk-nvapi#147)
[vkd3d-proton](HansKristian-Work/vkd3d-proton#1739)
[dxvk](doitsujin/dxvk#3690)

Thanks also goes to @ptr1337 for initially building and testing the patchset.
Copy link
Owner

@HansKristian-Work HansKristian-Work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments.

libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/vkd3d_private.h Outdated Show resolved Hide resolved
libs/vkd3d/vkd3d_private.h Show resolved Hide resolved
@HansKristian-Work
Copy link
Owner

Getting there now ...

@esullivan-nvidia esullivan-nvidia force-pushed the nv_low_latency2 branch 3 times, most recently from 45ceb29 to 2231497 Compare February 23, 2024 00:44
This commit add support for the VK_NV_low_latency2 extension, and
implements the ID3DLowLatencyDevice, and ID3D12CommandQueueExt
interfaces.
Copy link
Owner

@HansKristian-Work HansKristian-Work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of smol nits, but I think this is good now. I'll do some targeting testing to make sure nothing explodes.

libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/device_vkd3d_ext.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
libs/vkd3d/swapchain.c Outdated Show resolved Hide resolved
@HansKristian-Work
Copy link
Owner

Just committed the changed I wanted. I'll squash before merge.

@HansKristian-Work
Copy link
Owner

HansKristian-Work commented Feb 23, 2024

@esullivan-nvidia I tested this PR today and it doesn't seem like the NV impl actually sleeps based on GPU load.

https://github.com/HansKristian-Work/vkd3d-proton/tree/ll2-merge-testing has an ad-hoc test for demos/gears (-Denable_extras=true to build).

Using VKD3D_QUEUE_PROFILE (load with Chromium chrome://tracing) I get this result on NV 550.40.07 on a 4070. This is on GNOME Wayland with PRIME, but that shouldn't matter since the NV impl only considers CPU <-> GPU overlap atm I think, not presentation latency.

deadperiod

There is a 67ms latency between submit and GPU starting process.

I tested @ishitatsuyuki 's LFX2 implementation on RADV and it does sleep, so that probably rules out a test bug.

lfx2

The frame limiter aspect works though. If FPS goes above 60 then it locks to 60 in the test.

@esullivan-nvidia
Copy link
Contributor Author

I pushed another commit that should fix what you were seeing. It was a mistake on my part to preemptively notify the driver that the additional queue was out of band.

@HansKristian-Work
Copy link
Owner

HansKristian-Work commented Feb 27, 2024

I pushed another commit that should fix what you were seeing. It was a mistake on my part to preemptively notify the driver that the additional queue was out of band.

It is not fixed with that commit for me. Pushed update to ll2-merge-testing branch.

@esullivan-nvidia
Copy link
Contributor Author

The issue you are seeing with the sample gears application is a driver bug, and we have a fix. I will let you know when it releases in a public driver. Hopefully it can go out in the developer beta branch soon.

Would you be willing to use an application that supports Reflex like Cyberpunk 2077 (or any title you can make GPU bound) for testing on your end? No need to merge anything until the low latency gears test passes, I just want to confirm you can see that is working correctly.

@HansKristian-Work
Copy link
Owner

HansKristian-Work commented Mar 1, 2024

Should Reflex apps like CP77 just work right now with existing dxvk-nvapi? I can try that, sure. EDIT: @jp7677 pointed out jp7677/dxvk-nvapi#147 is needed as well.

Copy link
Owner

@HansKristian-Work HansKristian-Work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. I tested some Reflex games and it seems to work as expected with actual games. I'll merge in a separate PR since I'll be doing some squashing of my nit commits, etc.

@HansKristian-Work
Copy link
Owner

LL2 is finally merged 🥳

@esullivan-nvidia
Copy link
Contributor Author

It is great to see this merged, sorry for all the delays on my part. Thanks again to everyone for all of the feedback on this PR, I really appreciate it.

MrDuartePT added a commit to MrDuartePT/mrduarte-ebuilds that referenced this pull request Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants