Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono]: Add support for callee saved XMM registers on Windows x64. #97326

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lateralusX
Copy link
Member

Mono's unwinder doesn't support unwind ops for XMM registers. On Win x64 XMM6 - XMM15 are callee saved and needs to be restored by unwinder when changed. Mono's JIT codegen doesn't touch these registers, but LLVM codegen might use XMM6 - XMM15 in generated code.

Another scenario hitting this issues crashing or corrupting the process is mixed mode interpreter and Mono Full AOT. That scenario was uncovered though a GC crash where a reference was dereferenced by GC pointing to garbage memory. After investigating the crash it indicated that LLVM generated code used XMM6 to store a null reference that was used to reset memory after a call to a method (call to thread pool work item), this sequence was done in a loop, but XMM6 could be set to some garbage value on return from work item, and thread pool would write a garbage reference instead of null into memory and that would crash process on next GC. After digging some more the root cause turned out to be that our unwinder didn't restore XMM register when unwinding pass native interpreter frames, since our last manage frame (lmf) didn't store/restore callee saved XMM registers. So in case we had a sequence like this:

LLVM frame (using callee saved XMM register) -> interpreter (interpreter native code changing callee saved XMM register) -> LLVM frame -> throw managed exception

Then our unwinder will find the lmf state before entering interpreter and restore that state before continue unwinding the managed callstack, but since native interpreter code changed callee saved XMM registers, we failed to restore callee saved XMM registers as expected by callers leading to a corrupt process state.

The PR adds support to our code saving lmf to stack making sure it saves all callee saved XMM registers according to platforms ABI. PR also adds support to restore those register when unwinding through a lmf. The PR also extends the Windows LLVM support making sure our unwinder handles platform specific unwinder XMM opcodes so we correctly can restore XMM registers for LLVM frames. Since LLVM won't include XMM registers in generated dwarf unwind ops handled by Mono, this PR adds support to our unwinder to use the Mono specific dwarf unwind ops for general registers and then look through the platform specific unwind codes for any XMM related unwind op codes and apply them to the unwind context. Going forward we might extend the Mono unwinder to always use the platform specific unwind ops for LLVM generated frames, and drop a LLVM specific patch we have to generate both platform specific unwind ops (default) as well as dwarf based op codes for general registers, but since we already support dwarf unwind ops for general registers in Mono, it made sense to continue using the portable unwinder and only extend with support for XMM register opcodes on platforms using callee saved XMM registers, like Windows x64.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 22, 2024
@ghost ghost assigned lateralusX Jan 22, 2024
@lambdageek lambdageek added area-VM-meta-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 22, 2024
host_mgreg_t **save_locations,
int save_locations_len)
{
gboolean result = FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to either fix llvm to emit dwarf unwind info for these, or add this unwind info to the generic info when thats loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move back to just use windows unwind ops for llvm frames, right now we enable dwarf + native unwind codes in llvm generated code, and then dwarf is not following the windows ABI (since it misses the XMM registers) since the dwarf implementation was not meant to be used under Windows. It is probably better to add handling of additional native unwind codes for llvm frames and remove the patch that emits dwarf for llvm frames. That will also remove the duplication that we currently carry in the Windows x64 images. That change could be done as a follow up to this PR, if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vargaz @lateralusX could you create a GH issue for the follow up work

@lambdageek
Copy link
Member

@mangod9 @JulieLeeMSFT FYI. It might be useful for someone from the JIT team or the VM team who is familiar with the Windows ABI to review this, too, for knowledge sharing

@mangod9
Copy link
Member

mangod9 commented Jan 24, 2024

@VSadov @janvorli who could review.

mini_gc_set_slot_type_from_cfa (cfg, - (cfa_offset - save_area_offset), SLOT_NOREF);
} else {
mono_emit_unwind_op_offset (cfg, code, AMD64_NREG + i, - (-save_area_offset + (2 * 8)));
// FIXME: GC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: FIXME - is there some follow up needed for this?

Copy link
Member Author

@lateralusX lateralusX Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is mainly a copy/paste, so kept the original comments as well, I guess that fixme have been in there for many many years, so no follow up have been planned, not even sure its relevant anymore, but if we drop it here, we should also drop it from the source where it got copied,

.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it is some artifact from the times we had precise stack scanning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stack scanning is always conservative (is that the case?) then there is nothing needed here for GC.
Perhaps remove the FIXME from both places?

@VSadov
Copy link
Member

VSadov commented Jan 26, 2024

I've looked through the change. I think I understand what the change tries to do. XMM6-XMM15 are indeed nonvolatile on Win x64 and would need to be restored when unwinding.
It looks like the code does it correctly, but I am too unfamiliar with the codebase to tell if this is the "right way to do it".

@lateralusX
Copy link
Member Author

Realized that this PR got hanging, @vargaz ok to merge as is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants