-
-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shader resource tables #1165
base: main
Are you sure you want to change the base?
Conversation
this is intentional. Any shader that uses indirection of a pointer stored in user data sgprs will take the same approach as SRT (generate a cpu program to flatten the data read by the shader via s_load_dword*) But idk why the exceptions. Ill update this hopefully tomorrow to dump the generated cpu program so u can post that, unless u want to disas from the debugger. Also if you turn on shader dumping it should dump some *.ir.txt files. Posting <hash>._pre_hoist.ir.txt and <hash>.pre_flatten.ir.txt would help. |
Thanks for testing |
@baggins183 I'm getting a slightly different exception on the newer builds. This is from running Uncharted: The Nathan Drake Collection. Here's all the dumped files for the shader it's crashing on, compressed into a zip because I cannot directly send bin files. |
the mov edi, dword ptr [rdi+30h] looks wrong (dst should be rdi). Xbyak is doing the opposite of what i want when i say dword or qword explicitly lol I added dumping for the srt program with zydis in the last commit. Doesnt seem to show width though (qword, dword, etc) in the disassembly. Gonna try to fix that |
last commit might have fixed the edi thing |
Looks like we're back to that edi issue thing. I also see that the code shown in the debugger doesn't seem to align with the logged srtprogram, though I may just be missing something, as I'm not particularly familiar with this type of code. Let me know if you'd like me to hold off on testing future commits. I don't want to spam this PR too much if you're already aware of the current issues. |
nah i dont mind. Its definitely appreciated |
nvm i misread it. no idea why the edi is still there |
The dumped srtprogram looks right. Im guessing the VS window looks wrong because it starts decoding from above the start of the function, which is garbage bytes. So a red herring. Can u try dereferencing rdi when the crash happens? So something like this (gdb) also p/x $rdi to show what the pointer in s[12:13] actually is |
and then from the frame below (Info::RunSrtWalker) *(uint64_t*) (user_data.data() + 12)
is supposed to mask these off like Info::ReadUD |
Hmm, I can't get gdb working on Windows, and the crash doesn't seem to be occuring on Linux. If there's some other Windows debugger I could replicate this testing in, please let me know. For now, here are the CPU registers shown by the VSCode debugger when testing Uncharted. As for the other request, I was able to log that to the shad log. The actual data in there seems to vary by game. In Uncharted: In all the non-SRT games that crash, the value appears to always be 0. Here's a log from Clicker Heroes to validate that. |
found it, the windows calling conventions are different "Integer arguments are passed in registers RCX, RDX, R8, and R9" |
the problem was i assumed arg0 and arg1 go to rdi and rsi |
think that will fix it |
Uncharted is now back to where it was on main, crashing on Most of my non-SRT games are now crashing on Let me know if you need any logs. |
Weird, and theyre still regressed on linux? sorry to make u dual boot lol |
I believe the resource_tracking_pass assert was also getting hit on Linux, though I haven't retested on there with the latest pushes. Give me a moment to reboot, and I can confirm. |
if u could zip all the shader dumps for a shader hash that happens on thatd help out |
This part is expected |
Here's a shader from Clicker Heroes that's running into the assert. |
This one looks like a phi problem with the ssa_rewrite pass
the s_load_dwordx4 writes to s[12:13] which must be confusing the ssa pass. the walker program is wrong because of this definitely solvable though because theres nothing dynamic going on |
heres the bad phi |
I think with this it should be at parity with main |
actually there could be some problems, but might work |
I'll run my library through the latest version of this then, and I'll let you know if I find any remaining regressions. |
u can hold off if u want, i think it will be broken. Will need to refactor to fix it |
Looks like the issue is still present in the new build, so I'll hold off on further tests. |
thanks, will deal with this on the weekend |
Seems like I'm still hitting the same resource tracking pass error. Here's an updated shader dump in case anything looks different. |
Two issues I'm encountering testing this:
|
Oh yeah I saw this before, in Lego City Undercover, I hope the fix for that could also fix Lego City Undercover as that is the only shader crash away from vanilla boot |
I'm not sure about that game, but just to be clear in my case the shaders with the issue are ones that worked before this. |
The invalid sharp problems are related to phi instructions. I can probably hack around it but going to try to fix the underlying problem. That means fixing probably the control flow generation and the ssa_rewrite pass, might take a bit. Main works because it uses breadth first search which brute forces through phi nodes to find the original readconst/getuserdata. In the example I saw from Stephen, phis are being created even when theres 1 possible value for a given SGPR. The one i looked at showed a backedge being created from the end of the program (ENDPGM) to some label which shouldn't be possible. Id guess the lego city ones are failing due to phi problems. But if BFS isn't working, there might be actual dynamic selection of sharps which would be hard to deal with. |
sceGeoshadersfan posted about some other problem on windows, might be related. Right now i use new and delete wrong in the SmallCodeArray class. I need to alloc page-aligned memory to protect/unprotect it correctly and not interfere with other allocations. Turns out this is broken on windows, and also windows prevents an easy solution because they dont support std::aligned_allloc |
Searching phi instructions when tracking sharps is fine, especially if the texture fetch is inside a loop. There are problems currently though due to control flow restructuring where multiple sources could be possible to find from phi nodes, even when in original assembly only 1 path is possible. Main picks the first possible it finds from BFS which in these cases at best results in a ReadConst crash or at worst picking a completely wrong sharp. A relatively easy way to solve without messing with control flow is to perform a simple reachability test in the original CFG that mirrors the assembly control flow. I.e collect all source candidates from a complete search and test if the current block is reachable from the source block (the guest address of a ReadConst and be stored in its flags and used to lookup the block in the original CFG). In most cases this should resolve down to 1 possible access, but conditional sharp loads are possible to emit by the compiler and those need special handling for later The long term solution to this would be to rework control flow to allow else clauses in the structured control flow. This limits the amounts of paths the ssa path deems as reachable and will prevent some of these cases. However I'm not very aware of many reliable algorithms with this feature. Red Prig has mentioned "Goto Elimination Strategies in the Migration of Legacy Code to Java" which might be interesting to investigate. So I think for now to avoid doing too many invasive changes messing with control flow can be left for later probably |
src/shader_recompiler/ir/passes/flatten_extended_userdata_pass.cpp
Outdated
Show resolved
Hide resolved
} | ||
|
||
namespace { | ||
class SrtCodegen : public Xbyak::CodeGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is superfluous abstraction I think, can use Xbyak::CodeGenerator directly. Also it can be constructed each time its used, doesn't need to be a global singleton if memory is managed externally. Or you could use a static Xbyak::Codegen in this file if the memory can be managed by xbyak during the duration of the program (I think easier overall)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make the call on this. Personally id either use SmallCodeArray or a single Xbyak::CodeGenerator per Shader::Info.
Although these are inneficient since they need a page at minimum per program, and programs need separate allocations for garbage collection purposes.
my last choice would be using a single static Codegenerator that never gets reset cuz of the permutation problem
src/shader_recompiler/ir/passes/flatten_extended_userdata_pass.cpp
Outdated
Show resolved
Hide resolved
|
||
// Utility for copying a simple relocatable function from a Xbyak code generator to manage memory | ||
// separately | ||
class SmallCodeArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class also raises some questions if its useful or not. If we manage memory externally we can emit to it directly or just let xbyak manage it with a static instance, either way copying can probably be avoided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let xbyak manage it with a static instance
I dont think theres a way to garbage collect old programs if we do this
I thought if u have permutation blow up, then this might become a problem
thats the main reason i did this
But SmallCodeArray is probably superfluous vs having 1 Xbyak::CodeGenerator : 1 Shader::Info.
The only pro to SmallCodeArray is being smaller (forgot exactly why i did it), so probably not worth
// [0000555558a4e038] %305 = CompositeConstructU32x2 %297, %296 (uses: 4) | ||
// [0000555558a4e0a8] %306 = ReadConst %305, #0 (uses: 2) | ||
// Should probably be fixed in ssa_rewrite | ||
std::optional<IR::Value> TryRemoveTrivialPhi(IR::Inst* phi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better workaround might be to add a post-pass to ssa rewrite that removes those dead phis after the pass has run. At least then this file doesn't need to deal with it and we can fix it in faster way when IR gains ability to track users
Leaving this in draft for regression testing
Still needs a lot of refactoring. Also handling arbitrary readconst usage (not used in sharps).