Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shader resource tables #1165

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

baggins183
Copy link
Contributor

Leaving this in draft for regression testing

Still needs a lot of refactoring. Also handling arbitrary readconst usage (not used in sharps).

@StevenMiller123
Copy link
Contributor

Leaving this in draft for regression testing

Seems like a lot of normal games are running the SRT code and regressing as a result. Clicker Heroes, Hatsune Miku Project Diva Future Tone, Persona 5 Royal, and Skullgirls 2nd Encore all seem to do this.

Uncharted: The Nathan Drake Collection crashes in the same way as the prior titles, though it's a title that I believe is actually using SRTs, as it uses the S_LOAD_DWORD and S_LOAD_DWORDX2 instructions.

Every one of these titles crash with the same exception, pictured below:
image

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 2, 2024

Seems like a lot of normal games are running the SRT code

this is intentional. Any shader that uses indirection of a pointer stored in user data sgprs will take the same approach as SRT (generate a cpu program to flatten the data read by the shader via s_load_dword*)
Doing it 1 way simplifies the rest of the code.

But idk why the exceptions. Ill update this hopefully tomorrow to dump the generated cpu program so u can post that, unless u want to disas from the debugger.

Also if you turn on shader dumping it should dump some *.ir.txt files. Posting <hash>._pre_hoist.ir.txt and <hash>.pre_flatten.ir.txt would help.
Also the gcn

@baggins183
Copy link
Contributor Author

Thanks for testing

@StevenMiller123
Copy link
Contributor

@baggins183 I'm getting a slightly different exception on the newer builds. This is from running Uncharted: The Nathan Drake Collection.
image
image

Here's all the dumped files for the shader it's crashing on, compressed into a zip because I cannot directly send bin files.
fs_0x000000007c219ffb_0.zip

@baggins183
Copy link
Contributor Author

the mov edi, dword ptr [rdi+30h] looks wrong (dst should be rdi).
also the rest of the movs should be dword, but these are probably harmess

Xbyak is doing the opposite of what i want when i say dword or qword explicitly lol

I added dumping for the srt program with zydis in the last commit. Doesnt seem to show width though (qword, dword, etc) in the disassembly. Gonna try to fix that

@baggins183
Copy link
Contributor Author

last commit might have fixed the edi thing

@StevenMiller123
Copy link
Contributor

Seems like Uncharted is still crashing on the same shader.
image

fs_0x000000007c219ffb_0.zip

@StevenMiller123
Copy link
Contributor

Looks like we're back to that edi issue thing. I also see that the code shown in the debugger doesn't seem to align with the logged srtprogram, though I may just be missing something, as I'm not particularly familiar with this type of code.
image
fs_0x000000007c219ffb_0.zip

Let me know if you'd like me to hold off on testing future commits. I don't want to spam this PR too much if you're already aware of the current issues.

@baggins183
Copy link
Contributor Author

nah i dont mind. Its definitely appreciated

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 3, 2024

there might be a system dependent issue with xbyak for moving a 64 bit immediate to a register
because the initial PushPtr works on my os/cpu and should match the old path (readUD<>)
might need to do the mov(r10, 0xff...) in 2 separate instructions for the lo and hi bits

nvm i misread it. no idea why the edi is still there

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 3, 2024

The dumped srtprogram looks right. Im guessing the VS window looks wrong because it starts decoding from above the start of the function, which is garbage bytes. So a red herring.

Can u try dereferencing rdi when the crash happens?

So something like this (gdb)
p *(AmdGpu::Image *) $rdi

also p/x $rdi to show what the pointer in s[12:13] actually is

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 3, 2024

and then from the frame below (Info::RunSrtWalker)
can u post

*(uint64_t*) (user_data.data() + 12)
that will let me know if the hi bits are set. Might be useful.

c.mov(r10, 0xFFFFFFFFFFFFULL);
c.and_(rdi, r10);

is supposed to mask these off like Info::ReadUD

@StevenMiller123
Copy link
Contributor

Hmm, I can't get gdb working on Windows, and the crash doesn't seem to be occuring on Linux. If there's some other Windows debugger I could replicate this testing in, please let me know. For now, here are the CPU registers shown by the VSCode debugger when testing Uncharted.
image

As for the other request, I was able to log that to the shad log. The actual data in there seems to vary by game.
Code for reference:
image

In Uncharted:
image
Uncharted Log.txt

In all the non-SRT games that crash, the value appears to always be 0. Here's a log from Clicker Heroes to validate that.
Clicker Heroes Log.txt

@baggins183
Copy link
Contributor Author

found it, the windows calling conventions are different
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170

"Integer arguments are passed in registers RCX, RDX, R8, and R9"

@baggins183
Copy link
Contributor Author

the problem was i assumed arg0 and arg1 go to rdi and rsi

@baggins183
Copy link
Contributor Author

think that will fix it

@StevenMiller123
Copy link
Contributor

Uncharted is now back to where it was on main, crashing on [Debug] <Critical> data_share.cpp:operator():164: Assertion Failed! after logging Unknown opcode S_LOAD_DWORDX2

Most of my non-SRT games are now crashing on [Debug] <Critical> resource_tracking_pass.cpp:operator():349: Assertion Failed! (so still regressed compared to main).

Let me know if you need any logs.

@baggins183
Copy link
Contributor Author

Weird, and theyre still regressed on linux? sorry to make u dual boot lol

@StevenMiller123
Copy link
Contributor

I believe the resource_tracking_pass assert was also getting hit on Linux, though I haven't retested on there with the latest pushes. Give me a moment to reboot, and I can confirm.

@baggins183
Copy link
Contributor Author

if u could zip all the shader dumps for a shader hash that happens on thatd help out
the .bin, .pre_hoist, .pre_flatten, and .srtprogram are the most important

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 4, 2024

Unknown opcode S_LOAD_DWORDX2

This part is expected
Ill try it this weekend

@StevenMiller123
Copy link
Contributor

if u could zip all the shader dumps for a shader hash that happens on thatd help out the .bin, .pre_hoist, .pre_flatten, and .srtprogram are the most important

Here's a shader from Clicker Heroes that's running into the assert.
fs_0x00000000d66216a7_0.zip

@baggins183
Copy link
Contributor Author

This one looks like a phi problem with the ssa_rewrite pass

.L164_0:
/*0000000000a4*/ v_mov_b32       v2, s19
/*0000000000a8*/ s_load_dwordx4  s[12:15], s[12:13], 0x8
/*0000000000ac*/ v_mul_f32       v7, s18, v2
/*0000000000b0*/ s_waitcnt       lgkmcnt(0)
/*0000000000b4*/ s_buffer_load_dwordx8 s[20:27], s[12:15], 0x8

the s_load_dwordx4 writes to s[12:13] which must be confusing the ssa pass.
Probably works on main because one of the BreadFirstSearch's must be brute forcing to find the original GetUserData that writes to SGPR 12

the walker program is wrong because of this

definitely solvable though because theres nothing dynamic going on

@baggins183
Copy link
Contributor Author

baggins183 commented Oct 4, 2024

[0000562cee223808] %100 = Phi [ %27, {Block $0} ], [ %101, {Block $9} ] (uses: 2)

[0000562cee216918] %207   = CompositeConstructU32x2 %205, %206 (uses: 8)
[0000562cee216988] %208   = ReadConst %207, #0 (uses: 2)

heres the bad phi
%205 is identity to %100
I should turn on extra id removal passes to make this less confusing

@baggins183
Copy link
Contributor Author

I think with this it should be at parity with main

@baggins183
Copy link
Contributor Author

actually there could be some problems, but might work

@StevenMiller123
Copy link
Contributor

I'll run my library through the latest version of this then, and I'll let you know if I find any remaining regressions.

@baggins183
Copy link
Contributor Author

u can hold off if u want, i think it will be broken. Will need to refactor to fix it

@StevenMiller123
Copy link
Contributor

Looks like the issue is still present in the new build, so I'll hold off on further tests.

@baggins183
Copy link
Contributor Author

thanks, will deal with this on the weekend

@StevenMiller123
Copy link
Contributor

StevenMiller123 commented Oct 6, 2024

Seems like I'm still hitting the same resource tracking pass error. Here's an updated shader dump in case anything looks different.
fs_0x00000000d66216a7_0.zip

@squidbus
Copy link
Contributor

squidbus commented Oct 6, 2024

Two issues I'm encountering testing this:

  • Something about it is interfering with CPU instruction patches, I'm getting access violations when it tries to generate the trampoline for illegal instructions in guest code.
  • Working around that by re-enabling some ahead-of-time patching code, a number of fragment shaders are hitting the unreachable in ConvertImageType with an invalid type 2.

@ElBread3
Copy link
Contributor

ElBread3 commented Oct 6, 2024

a number of fragment shaders are hitting the unreachable in ConvertImageType with an invalid type 2.

Oh yeah I saw this before, in Lego City Undercover, I hope the fix for that could also fix Lego City Undercover as that is the only shader crash away from vanilla boot

@squidbus
Copy link
Contributor

squidbus commented Oct 6, 2024

Oh yeah I saw this before, in Lego City Undercover, I hope the fix for that could also fix Lego City Undercover as that is the only shader crash away from vanilla boot

I'm not sure about that game, but just to be clear in my case the shaders with the issue are ones that worked before this.

@baggins183
Copy link
Contributor Author

The invalid sharp problems are related to phi instructions. I can probably hack around it but going to try to fix the underlying problem. That means fixing probably the control flow generation and the ssa_rewrite pass, might take a bit.

Main works because it uses breadth first search which brute forces through phi nodes to find the original readconst/getuserdata.

In the example I saw from Stephen, phis are being created even when theres 1 possible value for a given SGPR. The one i looked at showed a backedge being created from the end of the program (ENDPGM) to some label which shouldn't be possible.

Id guess the lego city ones are failing due to phi problems. But if BFS isn't working, there might be actual dynamic selection of sharps which would be hard to deal with.

@baggins183
Copy link
Contributor Author

Something about it is interfering with CPU instruction patches, I'm getting access violations when it tries to generate the trampoline for illegal instructions in guest code.

sceGeoshadersfan posted about some other problem on windows, might be related. Right now i use new and delete wrong in the SmallCodeArray class. I need to alloc page-aligned memory to protect/unprotect it correctly and not interfere with other allocations. Turns out this is broken on windows, and also windows prevents an easy solution because they dont support std::aligned_allloc
So im currently using boost::aligned_alloc on my computer, but i think that needs to go in the boost submodule

@raphaelthegreat
Copy link
Collaborator

raphaelthegreat commented Oct 6, 2024

Searching phi instructions when tracking sharps is fine, especially if the texture fetch is inside a loop. There are problems currently though due to control flow restructuring where multiple sources could be possible to find from phi nodes, even when in original assembly only 1 path is possible. Main picks the first possible it finds from BFS which in these cases at best results in a ReadConst crash or at worst picking a completely wrong sharp.

A relatively easy way to solve without messing with control flow is to perform a simple reachability test in the original CFG that mirrors the assembly control flow. I.e collect all source candidates from a complete search and test if the current block is reachable from the source block (the guest address of a ReadConst and be stored in its flags and used to lookup the block in the original CFG). In most cases this should resolve down to 1 possible access, but conditional sharp loads are possible to emit by the compiler and those need special handling for later

The long term solution to this would be to rework control flow to allow else clauses in the structured control flow. This limits the amounts of paths the ssa path deems as reachable and will prevent some of these cases. However I'm not very aware of many reliable algorithms with this feature. Red Prig has mentioned "Goto Elimination Strategies in the Migration of Legacy Code to Java" which might be interesting to investigate.

So I think for now to avoid doing too many invasive changes messing with control flow can be left for later probably

src/shader_recompiler/ir/passes/srt.h Outdated Show resolved Hide resolved
src/video_core/buffer_cache/buffer_cache.cpp Outdated Show resolved Hide resolved
src/shader_recompiler/ir/passes/srt.h Outdated Show resolved Hide resolved
}

namespace {
class SrtCodegen : public Xbyak::CodeGenerator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is superfluous abstraction I think, can use Xbyak::CodeGenerator directly. Also it can be constructed each time its used, doesn't need to be a global singleton if memory is managed externally. Or you could use a static Xbyak::Codegen in this file if the memory can be managed by xbyak during the duration of the program (I think easier overall)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make the call on this. Personally id either use SmallCodeArray or a single Xbyak::CodeGenerator per Shader::Info.
Although these are inneficient since they need a page at minimum per program, and programs need separate allocations for garbage collection purposes.
my last choice would be using a single static Codegenerator that never gets reset cuz of the permutation problem


// Utility for copying a simple relocatable function from a Xbyak code generator to manage memory
// separately
class SmallCodeArray {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class also raises some questions if its useful or not. If we manage memory externally we can emit to it directly or just let xbyak manage it with a static instance, either way copying can probably be avoided

Copy link
Contributor Author

@baggins183 baggins183 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let xbyak manage it with a static instance

I dont think theres a way to garbage collect old programs if we do this
I thought if u have permutation blow up, then this might become a problem
thats the main reason i did this

But SmallCodeArray is probably superfluous vs having 1 Xbyak::CodeGenerator : 1 Shader::Info.
The only pro to SmallCodeArray is being smaller (forgot exactly why i did it), so probably not worth

src/shader_recompiler/ir/passes/srt.h Outdated Show resolved Hide resolved
src/shader_recompiler/ir/passes/srt.h Outdated Show resolved Hide resolved
// [0000555558a4e038] %305 = CompositeConstructU32x2 %297, %296 (uses: 4)
// [0000555558a4e0a8] %306 = ReadConst %305, #0 (uses: 2)
// Should probably be fixed in ssa_rewrite
std::optional<IR::Value> TryRemoveTrivialPhi(IR::Inst* phi) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better workaround might be to add a post-pass to ssa rewrite that removes those dead phis after the pass has run. At least then this file doesn't need to deal with it and we can fix it in faster way when IR gains ability to track users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants