Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable for PAC while compiling coreclr (not the jitted code) #108561

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

SwapnilGaikwad
Copy link
Contributor

Arm64 added a Pointer Authentication Code (PAC) extension in ARMv8.3. One of Its aims is to mitigate certain attacks that rely on corrupting the return address on the stack, such as return-oriented programming (ROP). Clang and GCC added support for PAC-RET that can be enabled using the compilation flag -mbranch-protection=pac-ret.
This patch enables PAC-RET for the libraries and binaries in CoreCLR compiled using clang on UNIX-based platforms.

It is important to note that the patch would enable PAC-RET for the JIT but not the code emitted from the JIT. Thus, the emitted code should remain unchanged by this change. However, adding PAC support leads to additional instructions in the compiled CoreCLR code for signing and authenticating return addresses and function pointers. It increases the size of compiled files slightly. For the libclrjit.so we noted an increase in size by 1.4%. Execution of additional instructions may also increase the compilation time. We saw compilation time increased by 1.6% (+/- 0.37%) while compiling all the methods in benchmarks.run_pgo.linux.arm64.checked.mch using superpmi on a Graviton 3 system.

Please find the details below. They include values when the Branch Target Identification (BTI) is enabled. As there are not BTI enabled machines available to test, not enabling it.

Execution Time:

benchmarks.run.linux.arm64.checked.mch
                                         Slowdown
Base         :  72023.25ms (+/- 0.37%)   (0.00%)
PAC          :  72782.88ms (+/- 0.24%)   (1.05%)
PAC+BTI      :  72781.52ms (+/- 0.34%)   (1.05%)

benchmarks.run_pgo.linux.arm64.checked.mch
                                         Slowdown
Base        :  277104.76ms (+/-0.35%)    (0.00%)
PAC         :  281714.87ms (+/-0.37%)    (1.66%)
PAC+BTI     :  282553.39ms (+/-0.38%)    (1.97%)

realworld.run.linux.arm64.checked.mch
                                        Slowdown
Base        :  56409.39ms (+/-0.30%)    (0.00%)
PAC         :  57162.73ms (+/-0.35%)    (1.34%)
PAC+BTI     :  57334.52ms (+/-0.41%)    (1.64%)

Setup

  • Used Release Build (bdcfb10eec930):
  • For PAC and PAC+BTI mode, compiled the jit and superpmi with the -mbranch-protection=pac-ret and -mbranch-protection=standard flags respectively. Then Dropped libclrjit.so* and superpmi (binary) in $CORE_ROOT
  • Used Graviton 3 that has following architecture features
CPU Flags:
fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs paca pacg dcpodp svei8mm svebf16 i8mm bf16 dgh rng
  • Disable parallel execution of superpmi by explicitly calling superpmi binary without -p flag.

File sizes:

-rwxr-xr-x 1 user user 2942776 Oct 1 13:58 base/libclrjit.so
-rwxr-xr-x 1 user user 2985064 Oct 1 10:07 pac/libclrjit.so
-rwxr-xr-x 1 user user 3008024 Oct 1 10:07 pac+bti/libclrjit.so

Execution methodology

  • Executed benchmarks.run.linux.arm64.checked.mch, benchmarks.run_pgo.linux.arm64.checked.mch and realworld.run.linux.arm64.checked.mch using superpmi.
  • Each function was compiled 5 times and executed (with repeatCount=5).
  • All 3 modes (baseline, pac, pac+bti), executed 10 times and average time was used as a representative value for that mode.

@kunalspathak @a74nh @dotnet/arm64-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 4, 2024
@SwapnilGaikwad
Copy link
Contributor Author

Following numbers are from an altra (N1) machine that doesn't support PAC or BTI. On such machines, PAC and BTI instructions are executed as nops but it leads to slightly increased file sizes.
We can note from the execution times that the impact on JITs throughput is difficult to distinguish from the noise.

Execution Time:

benchmarks.run.linux.arm64.checked.mch
                                  Slowdown
Base    : 75300.75ms (+/- 0.48%)   (0.00%)
PAC     : 75802.75ms (+/- 0.30%)   (0.67%)
PAC+BTI : 76038.32ms (+/- 0.21%)   (0.98%)

benchmarks.run_pgo.linux.arm64.checked.mch
                                  Slowdown
Base    : 314841.64ms (+/-0.37%)   (0.00%)
PAC     : 317086.04ms (+/-0.61%)   (0.71%)
PAC+BTI : 318262.13ms (+/-0.78%)   (1.09%)

realworld.run.linux.arm64.checked.mch
                                 Slowdown
Base    : 65200.13ms (+/-0.88%)   (0.00%)
PAC     : 65382.44ms (+/-1.04%)   (0.28%)
PAC+BTI : 65680.45ms (+/-0.87%)   (0.74%)

File sizes:

-rwxr-xr-x 1 user user  2863400 Oct  4 12:56 base/libclrjit.so
-rwxr-xr-x 1 user user  2873592 Oct  4 12:56 pac/libclrjit.so
-rwxr-xr-x 1 user user  2908808 Oct  4 12:56 pac+bti/libclrjit.so

@jkotas
Copy link
Member

jkotas commented Oct 5, 2024

Thank you for pushing on adoption of security mitigations in this repo!

.NET runtime has a lot of level code that can be affected by changes like this. For example, we use libunwind to unwind through native code that is going to be compiled with pointer authentication enabled after this change. The debugging support is typically where we find the long-tail of issues from changes like this one. cc @tommcdon

@jkotas jkotas added arch-arm64 os-linux Linux OS (any supported distro) labels Oct 5, 2024
@jkotas jkotas requested a review from a team October 5, 2024 01:18
@jkotas jkotas added area-Infrastructure-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 5, 2024
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Oct 5, 2024

the patch would enable PAC-RET for the JIT but not the code emitted from the JIT

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker. It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

@kunalspathak
Copy link
Member

the patch would enable PAC-RET for the JIT but not the code emitted from the JIT

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker.
It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

Yes, the original plan is to support it in JIT, but we started this work to get a sense of TP/size impact of PAC.

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker. It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET.

The native code is static so is known to the attacker ahead of time, so is a good place for the attacker to use to build a library of attack gadgets ahead of time. Whereas anything produced by the jit is harder to know in advance. But, as Kunal says, the plan would be to support in the Jit too.

Jit support should be fairly straightforward assuming the following:

  • CoreCLR does not move the location of program stacks. (as the return addresses on the stack will be encrypted using their location on the stack as the salt)
  • CoreCLR does not parse through stacks to rewrite the return addresses.

I'd recommend a rethink if either of those are the case as both of these are done by OpenJDK which made PAC-RET a lot more complicated and less secure.

For even more security, another feature that could be added is to encrypt the store of addresses that point to the location of jitted code. This would ensure that an attacker could not change them. That will be a self contained piece of work. I'm not aware of any other languaes which are doing that today.

The debugging support is typically where we find the long-tail of issues from changes like this one

Getting this PR in early will help find that long tail, rather than after the full PAC implementation.

.NET runtime has a lot of level code that can be affected by changes like this. For example, we use libunwind to unwind through native code that is going to be compiled with pointer authentication enabled after this change.

I see .NET has a copy of libunwind and llvm-libunwind. I'm not sure when either is used.

Looks like llvm-libunwind has support for PAC, eg:
src/native/external/llvm-libunwind/src/Registers.hpp line 2337:
uint32_t __pac; // Return Authentication Code (PAC)

But libunwind doesn't yet. A PR is here:
libunwind/libunwind#793
Which at worse case we could test and copy locally.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

CoreCLR does not parse through stacks to rewrite the return addresses.

CoreCLR does this.

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

CoreCLR does not parse through stacks to rewrite the return addresses.

CoreCLR does this.

Is that because of tiered compilation?

That will require the function that rewrites the stack to first confirm the old address on the stack is still valid, then encrypt the new return address before storing it on the stack. If the new return address is ever in memory before being stored on the stack then that's potentially adding an exploit a hacker could exploit to write their own values to the stack.
Hence you then may want to consider encrypting the new return address whenever it gets stored to any memory location.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

Is that because of tiered compilation?

It is because return address hijacking done as part thread suspension for GC: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/threading.md#hijacking

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

Is that because of tiered compilation?

It is because return address hijacking done as part thread suspension for GC: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/threading.md#hijacking

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

I see .NET has a copy of libunwind and llvm-libunwind. I'm not sure when either is used.

The llvm-libunwind is used by NativeAOT, the libunwind by coreclr except for macOS where llvm-libunwind is used too (since it is the default unwind library on macOS).

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

Just to make sure I understand how this would work. When we hijack a return address, we would need to modify both the LR and the register that stores the PAC, right?

@a74nh
Copy link
Contributor

a74nh commented Oct 7, 2024

Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location.

Just to make sure I understand how this would work. When we hijack a return address, we would need to modify both the LR and the register that stores the PAC, right?

The address of the GC stub we would keep in a global variable, meaning all access to it would hardcoded with fixed addresses in the assembly. This prevents the hacker from making us look at different variable.

To rewrite the stack, first load the existing return address from the stack and un-encrypt it to make sure it's a valid value (faulting if not). Then load the stub value into a register, unencrypt it (faulting if invalid), encrypt it for the stack, then store to the stack.

The encrypt and unecrypt instruction just requires the value to encypt, name of the key to use (A or B), and the salt (address on the stack). There is no pac register as such (except for the secret keys)

A little more background:

The assumption here is that the attacker has gotten the ability the change writable memory in the process (possibly only the stack) and read executable memory. They do not have the ability to change readonly memory, change the control flow or change/access register contents. The goal of the attacker is to make the program execute arbitrary code. This can be done by editing the return addresses on the stack. When the program returns, it now jumps back to code the attacker wants to run. This in itself is not that useful as the attacker is limited to functions available and register contents. By looking through all executable memory they look for small groups of instructions directly proceeding a return instruction. These are "gadgets" which simply change a register or write a bit of memory. By chaining gadgets together using return addresses the attacker now can execute whatever they want. Tools exist to look at the executable code of a known program (or library) and build a library of gadgets (which is why I'm concerned about protection of CoreCLR code over jitted code).

PAC-RET works because the return address stays in a register LR (which the attacker cannot access) and only goes to memory when saved to the stack, which is encrypted before the store. When loading from the stack we unencrypt, and fault on an error. To modify the address, the hacker would need to know the secret per-process key. The hacker can't simply replace it with a different encrypted value as the location on the stack is used as a salt in the encryption, meaning every encrypted value is pinned to that location.

@janvorli
Copy link
Member

janvorli commented Oct 7, 2024

@a74nh thank you for the details. Based on reading PAC doc here, their example was using PAC to encrypt the LR using SP and store the result in another register and then pushed both the LR and that result register. That made me think that the LR is pushed unencrypted and then the computed key is used to validate LR before return.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2024

Tools exist to look at the executable code of a known program (or library) and build a library of gadgets (which is why I'm concerned about protection of CoreCLR code over jitted code).

The typical .NET app has a lot of AOT (R2R) compiled code. It is as predictable as unmanaged runtime code and its size is about an order magnitude bigger compared to size of the unmanaged runtime code.

@hoyosjs
Copy link
Member

hoyosjs commented Oct 7, 2024

This might also break Out of Process stack walking in the DAC, although I'm not sure since the only part of coreclr we'd care about is FCall

@a74nh
Copy link
Contributor

a74nh commented Oct 10, 2024

More things to think about:


It may be possible to remove the use of the GS cookie when PAC is enabled.


The jit code will need to be triggered via a boolean. (if (config.PACenabled) { emit some pac instructions... } etc). Given that this boolean is itself vulnerable to attack then eventually we should either:

  1. put the bool into readonly memory asap after startup
  2. or, the jit should always emit PAC instructions on ARM64. On non-PAC hardware these instructions will be NOPs. (On Linux distros that support PAC, all tools and libraries are all built with PAC enabled. On non-PAC hardware these run fine).

Before enabling this by default on PAC enabled boxes, we should ensure the CI has PAC hardware. And consider what needs to be CI tested with and without PAC.


For now enable on Linux only. Consider Windows, MacOS etc later.


BTI is a separate technology that is starting to appear in the latest ARM hardware. When enabled, at a program branch, if the instruction at the target is not one in the small BTI subset (including some older instructions) then the program segfaults. Once PAC is fully enabled then we should consider BTI in a following .NET release.

@a74nh
Copy link
Contributor

a74nh commented Oct 14, 2024

https://llsoftsec.github.io/llsoftsecbook/ - This has a good description of PAC-RET and other security issues.

Arm also has GCS, but I suspect it'll be a few years before it appears in any real hardware. This is similar to CET in X64. Both provide a shadow for storing return addresses. Advantage here is that the same code should be reusable for both technologies. But I suspect it'll be tricky to implement.

@a74nh
Copy link
Contributor

a74nh commented Oct 15, 2024

#47309 - Support for Intel CET

@a74nh
Copy link
Contributor

a74nh commented Oct 23, 2024

@janvorli - is there anything blocking this PR from going in?

From my side I'd like to create a story containing all the work items that need doing for PAC-RET including extracting all the relevant comments from this PR.

@jkotas
Copy link
Member

jkotas commented Oct 23, 2024

is there anything blocking this PR from going in?

Figuring out the testing story. Do our existing Arm64 CI machines have PAC support?

We want to avoid situation where .NET is broken in various ways on machines with PAC support.

@a74nh
Copy link
Contributor

a74nh commented Oct 23, 2024

is there anything blocking this PR from going in?

Figuring out the testing story. Do our existing Arm64 CI machines have PAC support?

We want to avoid situation where .NET is broken in various ways on machines with PAC support.

With the current CI, it's testing that building with pac-ret doesn't break on non-PAC hardware.

The CI is missing PAC hardware, so we're missing testing on that.

Cobalt 100 supports PAC, so sounds like we need to wait until that is in the CI.

@kunalspathak
Copy link
Member

@SwapnilGaikwad - can you please send a dummy change in JIT folder to make sure we run superpmi-diff pipeline to make sure there is no TP impact of running this on non-PAC machines?

@kunalspathak
Copy link
Member

@SwapnilGaikwad - can you please send a dummy change in JIT folder to make sure we run superpmi-diff pipeline to make sure there is no TP impact of running this on non-PAC machines?

as expected, no tp diffs. I do see test failures on osx/arm64.

  • !"SoftwareExceptionFrame::Init failed"
  • ((end & ~PAGE_MASK) == 0), function MemoryRegion, file memoryregion.h, line 48.

can be related, since you merged the main ?

@janvorli
Copy link
Member

!"SoftwareExceptionFrame::Init failed"

This means that virtual unwind from IL_Throw or IL_Rethrow to their managed caller has failed - the PAL_VirtualUnwind has returned FALSE

@kunalspathak
Copy link
Member

!"SoftwareExceptionFrame::Init failed"

This means that virtual unwind from IL_Throw or IL_Rethrow to their managed caller has failed - the PAL_VirtualUnwind has returned FALSE

Is it because of enabling PAC or something else?

@janvorli
Copy link
Member

Is it because of enabling PAC or something else?

I would guess so, we haven't seen such an issue before.

Moreover, with the change from this PR, I cannot even build coreclr repo on mac M1. When crossgen2 is trying to crossgen S.P.C., it crashes:

Generating native image of System.Private.CoreLib for osx.arm64.Debug. Logging to
  /Users/janvorli/git/runtime/dotnet.sh /Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/arm64/crossgen2/crossgen2.dll -o:/Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/System.Private.CoreLib.dll -r:/Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/IL/*.dll --targetarch:arm64 --targetos:osx -O --verify-type-and-field-layout /Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/IL/System.Private.CoreLib.dll
  libc++abi: terminating due to uncaught exception of type CorInfoExceptionClass*

If I remove the -mbranch-protection=pac-ret option again, the compilation succeeds.

@am11
Copy link
Member

am11 commented Oct 26, 2024

it crashes:

Generating native image of System.Private.CoreLib for osx.arm64.Debug. Logging to
  /Users/janvorli/git/runtime/dotnet.sh /Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/arm64/crossgen2/crossgen2.dll -o:/Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/System.Private.CoreLib.dll -r:/Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/IL/*.dll --targetarch:arm64 --targetos:osx -O --verify-type-and-field-layout /Users/janvorli/git/runtime/artifacts/bin/coreclr/osx.arm64.Debug/IL/System.Private.CoreLib.dll
  libc++abi: terminating due to uncaught exception of type CorInfoExceptionClass*
Here is the callstack at the point of SIGABRT
(lldb) clrstack -f
OS Thread Id: 0x2306a2f (42)
        Child SP               IP Call Site
0000000173123930 00000001911CA600 libsystem_kernel.dylib!__pthread_kill + 8
0000000173123930 0000000191202F70 libsystem_pthread.dylib!pthread_kill + 288
0000000173123960 000000019110F908 libsystem_c.dylib!abort + 128
00000001731239A0 00000001911B944C libc++abi.dylib!__cxxabiv1::__aligned_malloc_with_fallback(unsigned long)
00000001731239E0 00000001911A7A40 libc++abi.dylib!demangling_terminate_handler() + 348
0000000173123A30 0000000190E4D3F4 libobjc.A.dylib!_objc_terminate() + 172
0000000173123A50 00000001911B8710 libc++abi.dylib!std::__terminate(void (*)()) + 16
0000000173123A60 00000001911BBCDC libc++abi.dylib!__cxa_get_exception_ptr
0000000173123A80 00000001911BBC84 libc++abi.dylib!__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)
0000000173123AB0 00000001020E4BD8 libjitinterface_arm64.dylib!JitInterfaceWrapper::recordRelocation(void*, void*, void*, unsigned short, int) + 140
0000000173123AF0 000000012D160CC0 libclrjit_universal_arm64_arm64.dylib!emitter::emitRecordRelocationHelp(void*, void*, unsigned short, char const*, int) + 244
0000000173123B80 000000012D513058 libclrjit_universal_arm64_arm64.dylib!emitter::emitOutputInstr(insGroup*, emitter::instrDesc*, unsigned char**) + 6552
0000000173123E30 000000012D1579BC libclrjit_universal_arm64_arm64.dylib!emitter::emitIssue1Instr(insGroup*, emitter::instrDesc*, unsigned char**) + 68
0000000173123EE0 000000012D15D0CC libclrjit_universal_arm64_arm64.dylib!emitter::emitEndCodeGen(Compiler*, bool, bool, bool, unsigned int, unsigned int*, unsigned int*, void**, void**, void**, void**, void**, void**, unsigned int*) + 5000
0000000173124320 000000012D0F7A20 libclrjit_universal_arm64_arm64.dylib!CodeGen::genEmitMachineCode() + 476
00000001731243F0 000000012D10B288 libclrjit_universal_arm64_arm64.dylib!CodeGenPhase::DoPhase() + 128
0000000173124430 000000012D3F4560 libclrjit_universal_arm64_arm64.dylib!Phase::Run() + 76
0000000173124490 000000012D0F71EC libclrjit_universal_arm64_arm64.dylib!DoPhase(CodeGen*, Phases, void (CodeGen::*)()) + 100
0000000173124520 000000012D0F70CC libclrjit_universal_arm64_arm64.dylib!CodeGen::genGenerateCode(void**, unsigned int*) + 208
00000001731245B0 000000012D127DB0 libclrjit_universal_arm64_arm64.dylib!Compiler::compCompile(void**, unsigned int*, JitFlags*) + 8000
0000000173125260 000000012D12DABC libclrjit_universal_arm64_arm64.dylib!Compiler::compCompileHelper(CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*) + 3400
00000001731253C0 000000012D12BD78 libclrjit_universal_arm64_arm64.dylib!Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*)::$_0::operator()(Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*)::__JITParam*) const + 96
00000001731253F0 000000012D12B4F0 libclrjit_universal_arm64_arm64.dylib!Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*) + 2256
00000001731255F0 000000012D13937C libclrjit_universal_arm64_arm64.dylib!jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_1::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::'lambda'(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_1::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::__JITParam*)::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_1::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) + 404
0000000173125630 000000012D12FC9C libclrjit_universal_arm64_arm64.dylib!jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_1::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const + 112
00000001731256C0 000000012D12F960 libclrjit_universal_arm64_arm64.dylib!jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) + 400
0000000173125DA0 000000012D147630 libclrjit_universal_arm64_arm64.dylib!CILJit::compileMethod(ICorJitInfo*, CORINFO_METHOD_INFO*, unsigned int, unsigned char**, unsigned int*) + 364
0000000173125EB0 00000001020E4E38 libjitinterface_arm64.dylib!JitCompileMethod + 212
0000000173125F80 0000000106020FD4 
0000000173125FE0                  [InlinedCallFrame: 0000000173125fe0] ILCompiler.ReadyToRun.dll!Internal.JitInterface.CorInfoImpl.JitCompileMethod(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef)
0000000173125FE0                  [InlinedCallFrame: 0000000173125fe0] ILCompiler.ReadyToRun.dll!Internal.JitInterface.CorInfoImpl.JitCompileMethod(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef)
0000000173125F80 0000000106020FD4 ILCompiler.ReadyToRun.dll!ILStubClass.IL_STUB_PInvoke(IntPtr ByRef, IntPtr, IntPtr, IntPtr, Internal.JitInterface.CORINFO_METHOD_INFO ByRef, UInt32, IntPtr ByRef, UInt32 ByRef) + 420
0000000173126120 000000010601CEA8 ILCompiler.ReadyToRun.dll!Internal.JitInterface.CorInfoImpl.CompileMethodInternal(ILCompiler.DependencyAnalysis.IMethodNode, Internal.IL.MethodIL) + 344 [/Users/am11/projects/runtime5/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs @ 353]
00000001731263A0 00000001060183F8 ILCompiler.ReadyToRun.dll!Internal.JitInterface.CorInfoImpl.CompileMethod(ILCompiler.DependencyAnalysis.ReadyToRun.MethodWithGCInfo, ILCompiler.Logger) + 4168 [/Users/am11/projects/runtime5/src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs @ 810]
0000000173126860 0000000106013D9C ILCompiler.ReadyToRun.dll!ILCompiler.ReadyToRunCodegenCompilation+<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOneMethod|5(ILCompiler.DependencyAnalysisFramework.DependencyNodeCore`1<ILCompiler.DependencyAnalysis.NodeFactory>, Int32) + 1844 [/Users/am11/projects/runtime5/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs @ 899]
0000000173126B30 00000001060135F0 ILCompiler.ReadyToRun.dll!ILCompiler.ReadyToRunCodegenCompilation+<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompileOnThread|4(Int32) + 440 [/Users/am11/projects/runtime5/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs @ 833]
0000000173126BA0 00000001060133BC ILCompiler.ReadyToRun.dll!ILCompiler.ReadyToRunCodegenCompilation+<>c__DisplayClass50_0.<ComputeDependencyNodeDependencies>g__CompilationThread|3(System.Object) + 292 [/Users/am11/projects/runtime5/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilation.cs @ 811]
0000000173126BF0 0000000103382188 System.Private.CoreLib.dll!System.Threading.Thread.StartCallback() + 264
0000000173126DC8                  [DebuggerU2MCatchHandlerFrame: 0000000173126dc8] 
FFFFFFFFFFFFFFFF 000000010601CEA8 
FFFFFFFFFFFFFFFF 00000001060183F8 
FFFFFFFFFFFFFFFF 0000000106013D9C 
FFFFFFFFFFFFFFFF 00000001060135F0 
FFFFFFFFFFFFFFFF 00000001060133BC 
FFFFFFFFFFFFFFFF 0000000103382188 
FFFFFFFFFFFFFFFF 0000000100AF8210 libcoreclr.dylib!CallDescrWorkerInternal + 132
0000000173126C40 000000010097A52C libcoreclr.dylib!DispatchCallSimple(unsigned long*, unsigned int, unsigned long, unsigned int) + 264
0000000173126D20 000000010098DF3C libcoreclr.dylib!ThreadNative::KickOffThread_Worker(void*) + 148
0000000173126D90 000000010094C6E4 libcoreclr.dylib!ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 256
0000000173126EC0 000000010094CC2C libcoreclr.dylib!ManagedThreadBase::KickOff(void (*)(void*), void*) + 32
0000000173126EF0 000000010098E03C libcoreclr.dylib!ThreadNative::KickOffThread(void*) + 212
0000000173126F90 00000001008663F4 libcoreclr.dylib!CorUnix::CPalThread::ThreadEntry(void*) + 380
0000000173126FD0 00000001912032E4 libsystem_pthread.dylib!_pthread_start + 136

@a74nh
Copy link
Contributor

a74nh commented Oct 28, 2024

The Mac machines in the CI are the only boxes that currently support PAC.

So either the testing that was manually done on a Linux PAC machine was missing something compared to what is tested in the CI.
Or there is something fundamentally different between Linux and MacOS when building crossgen2 (ie exceptions being handled during the build)

This means that virtual unwind from IL_Throw or IL_Rethrow to their managed caller has failed - the PAL_VirtualUnwind has returned FALSE

If PAL_VirtualUnwind has to unwind C++ functionality, then this makes sense as it won't (yet) understand addresses that have been PAC encrypted. The fix is probably as simple as ensuring that inside PAL_VirtualUnwind, PAC codes are stripped from return addresses.

@kunalspathak
Copy link
Member

So either the testing that was manually done on a Linux PAC machine was missing something compared to what is tested in the CI.

Yes, I will be very curious to know this because I am imagining this is a basic scenario that would break even on Linux PAC machine perhaps?

@janvorli
Copy link
Member

I've spent time today to investigate the issue on macOS arm64. When the runtime is compiled with the PAC enabling option, it uses various PAC related instructions in the generated code, for example PACIASP in the function prolog. However, the signing seems to be setup in a way that it doesn't alter the return address in any way. I guess the reason is that Apple doc states that in order to enable pointer authentication, the binary needs to be compiled as "arm64e" architecture instead of "arm64" that we use.
The failure that the unw_step returns with any software exception handling when we try to unwind native frame from coreclr is -6550, which means UNW_ECROSSRASIGNING. Looking at LLVM libunwind sources, it seems that it should be returned when the _LIBUNWIND_IS_NATIVE_ONLY is not defined. Which seems to indicate that the libunwind is compiled to support unwinding of multiple platforms. Maybe that means arm64 + arm64e. The problem is that if dwarf info contains DW_CFA_AARCH64_negate_ra_state to indicate that the return address is authenticated from that program location on, the libunwind always fails to unwind.

My theory is that the PAC should not be enabled for macOS arm64 unless we change the target to arm64e where it might work then.

@a74nh
Copy link
Contributor

a74nh commented Oct 30, 2024

My theory is that the PAC should not be enabled for macOS arm64 unless we change the target to arm64e where it might work then.

This sounds reasonable. Enabling for one OS at a time would be sensible.

@SwapnilGaikwad SwapnilGaikwad marked this pull request as ready for review October 31, 2024 10:49
eng/native/configurecompiler.cmake Outdated Show resolved Hide resolved
eng/native/configurecompiler.cmake Outdated Show resolved Hide resolved
@SwapnilGaikwad
Copy link
Contributor Author

Now the patch has limited the PAC enablement to Linux only machines, we were trying to ensure it won't break on any Linux system. We used a V1 machine instances that had Ubuntu 22.04.5 LTS for the analysis reported here. However, it didn't have a libc with pac. Thus, we tested it in a Debian Trixie docker container on the same system that had libc with pac+bti enabled.

We noticed that the checked version builds successfully and passes all the tests as that of main. However, the Release version fails to build with a nuget error. The same error occurs even on the main branch so doesn't seem to be specific to the PAC change.
As the Trixie is the next major release for Debian and is in a testing mode, and unsure whether it's one of the supported ones. Just keeping you informed.

error: Error in reading profile /home/user/.nuget/packages/optimization.linux-arm64.pgo.coreclr/1.0.0-prerelease.24462.2/data/coreclr.profdata: unsupported instrumentation profile format version
  make[2]: *** [jit/CMakeFiles/clrjit.dir/build.make:77: jit/CMakeFiles/clrjit.dir/cmake_pch.hxx.pch] Error 1
  make[1]: *** [CMakeFiles/Makefile2:4038: jit/CMakeFiles/clrjit.dir/all] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member os-linux Linux OS (any supported distro)
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

9 participants