-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable for PAC while compiling coreclr (not the jitted code) #108561
base: main
Are you sure you want to change the base?
Conversation
Following numbers are from an altra (N1) machine that doesn't support PAC or BTI. On such machines, PAC and BTI instructions are executed as Execution Time:
File sizes:
|
Thank you for pushing on adoption of security mitigations in this repo! .NET runtime has a lot of level code that can be affected by changes like this. For example, we use libunwind to unwind through native code that is going to be compiled with pointer authentication enabled after this change. The debugging support is typically where we find the long-tail of issues from changes like this one. cc @tommcdon |
Tagging subscribers to this area: @hoyosjs |
The code produced by the JIT is more likely to be attacked since it is processing the input controlled by the attacker. It is rare for code in native runtime libraries to process input controller by the attacker. This change is general goodness, but it is not closing the most likely path of the ROP attacks in .NET. |
Yes, the original plan is to support it in JIT, but we started this work to get a sense of TP/size impact of PAC. |
The native code is static so is known to the attacker ahead of time, so is a good place for the attacker to use to build a library of attack gadgets ahead of time. Whereas anything produced by the jit is harder to know in advance. But, as Kunal says, the plan would be to support in the Jit too. Jit support should be fairly straightforward assuming the following:
I'd recommend a rethink if either of those are the case as both of these are done by OpenJDK which made PAC-RET a lot more complicated and less secure. For even more security, another feature that could be added is to encrypt the store of addresses that point to the location of jitted code. This would ensure that an attacker could not change them. That will be a self contained piece of work. I'm not aware of any other languaes which are doing that today.
Getting this PR in early will help find that long tail, rather than after the full PAC implementation.
I see .NET has a copy of libunwind and llvm-libunwind. I'm not sure when either is used. Looks like llvm-libunwind has support for PAC, eg: But libunwind doesn't yet. A PR is here: |
CoreCLR does this. |
Is that because of tiered compilation? That will require the function that rewrites the stack to first confirm the old address on the stack is still valid, then encrypt the new return address before storing it on the stack. If the new return address is ever in memory before being stored on the stack then that's potentially adding an exploit a hacker could exploit to write their own values to the stack. |
It is because return address hijacking done as part thread suspension for GC: https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/threading.md#hijacking |
Ok, this is returning to a stub. So should be fine to have an encrypted pointer to that stub which itself is kept in a fixed location. |
The llvm-libunwind is used by NativeAOT, the libunwind by coreclr except for macOS where llvm-libunwind is used too (since it is the default unwind library on macOS). |
Just to make sure I understand how this would work. When we hijack a return address, we would need to modify both the LR and the register that stores the PAC, right? |
The address of the GC stub we would keep in a global variable, meaning all access to it would hardcoded with fixed addresses in the assembly. This prevents the hacker from making us look at different variable. To rewrite the stack, first load the existing return address from the stack and un-encrypt it to make sure it's a valid value (faulting if not). Then load the stub value into a register, unencrypt it (faulting if invalid), encrypt it for the stack, then store to the stack. The encrypt and unecrypt instruction just requires the value to encypt, name of the key to use (A or B), and the salt (address on the stack). There is no pac register as such (except for the secret keys) A little more background: The assumption here is that the attacker has gotten the ability the change writable memory in the process (possibly only the stack) and read executable memory. They do not have the ability to change readonly memory, change the control flow or change/access register contents. The goal of the attacker is to make the program execute arbitrary code. This can be done by editing the return addresses on the stack. When the program returns, it now jumps back to code the attacker wants to run. This in itself is not that useful as the attacker is limited to functions available and register contents. By looking through all executable memory they look for small groups of instructions directly proceeding a return instruction. These are "gadgets" which simply change a register or write a bit of memory. By chaining gadgets together using return addresses the attacker now can execute whatever they want. Tools exist to look at the executable code of a known program (or library) and build a library of gadgets (which is why I'm concerned about protection of CoreCLR code over jitted code). PAC-RET works because the return address stays in a register LR (which the attacker cannot access) and only goes to memory when saved to the stack, which is encrypted before the store. When loading from the stack we unencrypt, and fault on an error. To modify the address, the hacker would need to know the secret per-process key. The hacker can't simply replace it with a different encrypted value as the location on the stack is used as a salt in the encryption, meaning every encrypted value is pinned to that location. |
@a74nh thank you for the details. Based on reading PAC doc here, their example was using PAC to encrypt the LR using SP and store the result in another register and then pushed both the LR and that result register. That made me think that the LR is pushed unencrypted and then the computed key is used to validate LR before return. |
The typical .NET app has a lot of AOT (R2R) compiled code. It is as predictable as unmanaged runtime code and its size is about an order magnitude bigger compared to size of the unmanaged runtime code. |
This might also break Out of Process stack walking in the DAC, although I'm not sure since the only part of coreclr we'd care about is FCall |
More things to think about: It may be possible to remove the use of the GS cookie when PAC is enabled. The jit code will need to be triggered via a boolean. (
Before enabling this by default on PAC enabled boxes, we should ensure the CI has PAC hardware. And consider what needs to be CI tested with and without PAC. For now enable on Linux only. Consider Windows, MacOS etc later. BTI is a separate technology that is starting to appear in the latest ARM hardware. When enabled, at a program branch, if the instruction at the target is not one in the small BTI subset (including some older instructions) then the program segfaults. Once PAC is fully enabled then we should consider BTI in a following .NET release. |
https://llsoftsec.github.io/llsoftsecbook/ - This has a good description of PAC-RET and other security issues. Arm also has GCS, but I suspect it'll be a few years before it appears in any real hardware. This is similar to CET in X64. Both provide a shadow for storing return addresses. Advantage here is that the same code should be reusable for both technologies. But I suspect it'll be tricky to implement. |
#47309 - Support for Intel CET |
@janvorli - is there anything blocking this PR from going in? From my side I'd like to create a story containing all the work items that need doing for PAC-RET including extracting all the relevant comments from this PR. |
Figuring out the testing story. Do our existing Arm64 CI machines have PAC support? We want to avoid situation where .NET is broken in various ways on machines with PAC support. |
With the current CI, it's testing that building with pac-ret doesn't break on non-PAC hardware. The CI is missing PAC hardware, so we're missing testing on that. Cobalt 100 supports PAC, so sounds like we need to wait until that is in the CI. |
@SwapnilGaikwad - can you please send a dummy change in JIT folder to make sure we run superpmi-diff pipeline to make sure there is no TP impact of running this on non-PAC machines? |
as expected, no tp diffs. I do see test failures on osx/arm64.
can be related, since you merged the main ? |
This means that virtual unwind from |
Is it because of enabling PAC or something else? |
I would guess so, we haven't seen such an issue before. Moreover, with the change from this PR, I cannot even build coreclr repo on mac M1. When crossgen2 is trying to crossgen S.P.C., it crashes:
If I remove the |
Here is the callstack at the point of
|
The Mac machines in the CI are the only boxes that currently support PAC. So either the testing that was manually done on a Linux PAC machine was missing something compared to what is tested in the CI.
If |
Yes, I will be very curious to know this because I am imagining this is a basic scenario that would break even on Linux PAC machine perhaps? |
I've spent time today to investigate the issue on macOS arm64. When the runtime is compiled with the PAC enabling option, it uses various PAC related instructions in the generated code, for example My theory is that the PAC should not be enabled for macOS arm64 unless we change the target to arm64e where it might work then. |
This sounds reasonable. Enabling for one OS at a time would be sensible. |
Co-authored-by: Adeel Mujahid <[email protected]>
Now the patch has limited the PAC enablement to Linux only machines, we were trying to ensure it won't break on any Linux system. We used a V1 machine instances that had Ubuntu 22.04.5 LTS for the analysis reported here. However, it didn't have a libc with pac. Thus, we tested it in a Debian Trixie docker container on the same system that had libc with We noticed that the checked version builds successfully and passes all the tests as that of main. However, the Release version fails to build with a nuget error. The same error occurs even on the main branch so doesn't seem to be specific to the PAC change.
|
Arm64 added a Pointer Authentication Code (PAC) extension in ARMv8.3. One of Its aims is to mitigate certain attacks that rely on corrupting the return address on the stack, such as return-oriented programming (ROP). Clang and GCC added support for PAC-RET that can be enabled using the compilation flag -mbranch-protection=pac-ret.
This patch enables PAC-RET for the libraries and binaries in CoreCLR compiled using clang on UNIX-based platforms.
It is important to note that the patch would enable PAC-RET for the JIT but not the code emitted from the JIT. Thus, the emitted code should remain unchanged by this change. However, adding PAC support leads to additional instructions in the compiled CoreCLR code for signing and authenticating return addresses and function pointers. It increases the size of compiled files slightly. For the
libclrjit.so
we noted an increase in size by 1.4%. Execution of additional instructions may also increase the compilation time. We saw compilation time increased by 1.6% (+/- 0.37%) while compiling all the methods inbenchmarks.run_pgo.linux.arm64.checked.mch
using superpmi on a Graviton 3 system.Please find the details below. They include values when the Branch Target Identification (BTI) is enabled. As there are not BTI enabled machines available to test, not enabling it.
Execution Time:
Setup
bdcfb10eec930
):-mbranch-protection=pac-ret
and-mbranch-protection=standard
flags respectively. Then Droppedlibclrjit.so*
and superpmi (binary) in$CORE_ROOT
File sizes:
-rwxr-xr-x 1 user user 2942776 Oct 1 13:58 base/libclrjit.so
-rwxr-xr-x 1 user user 2985064 Oct 1 10:07 pac/libclrjit.so
-rwxr-xr-x 1 user user 3008024 Oct 1 10:07 pac+bti/libclrjit.so
Execution methodology
benchmarks.run.linux.arm64.checked.mch
,benchmarks.run_pgo.linux.arm64.checked.mch
andrealworld.run.linux.arm64.checked.mch
using superpmi.@kunalspathak @a74nh @dotnet/arm64-contrib