Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVM: fillAddressSpace broken when physical address space is greater than 46 bits #208

Closed
prattmic opened this issue Apr 22, 2019 · 6 comments
Assignees
Labels
platform: kvm Issue related to the kvm platform priority: p2 Normal priority type: bug Something isn't working

Comments

@prattmic
Copy link
Member

kvm.fillAddressSpace effectively assumes that ring0.PhysicalAddressBits() returns 46 or less, as VirtualAddressBits is (effectively) always 48, which we then cut in half for the kernel address space.

A 48-bit physical address space is at least possible on AMD Naples cores:

$ cat /proc/cpuinfo
...
address sizes	: 48 bits physical, 48 bits virtual
@prattmic prattmic added type: bug Something isn't working platform: kvm Issue related to the kvm platform priority: p2 Normal priority labels Apr 22, 2019
@codomania
Copy link

FYI, If SME/SEV is enabled on AMD Naples system then physical address space is reduced by a few bits.

$ cat /proc/cpuinfo
.....
address sizes : 43 bits physical, 48 bits virtual
...
I have tried Gvisor KVM platform on AMD Naples with SME enabled/disable and seeing the failure.

@codomania
Copy link

codomania commented Apr 25, 2019

I looked at the code today. Currently, PhysicalAddressBits() uses CPUID 0x80000008 to query the number of bits in physical address. On AMD Zen core, if memory encryption is enabled then the physical address is reduced. The number of bits reduced from physical address space is available via CPUID 0x8000_001F. So, we first need to check if leaf 8000_001F exist, if so check whether SME is enabled. When enabled CPUID 0x8000_001F.EBX[5:0] can be used to get the number of reduced physical bits.

Here is my naive attempt to fix it. But this is causing compilation error
"
/root/.cache/bazel/_bazel_root/b4a6b971b553ff6e5ffe7760c9348cdd/sandbox/linux-sandbox/1/execroot/main/bazel-out/host/bin/pkg/sentry/platform/ring0/gen_offsets/defs_impl.go:411:6: undefined: rdmsr
"
NOTE: If I hardcode the values then I am able to launch the contain just fine. Maybe someone who is more familiar with code can make the fix in the right place.

diff --git a/pkg/sentry/platform/ring0/x86.go b/pkg/sentry/platform/ring0/x86.go
index 7c88010..9786580 100644
--- a/pkg/sentry/platform/ring0/x86.go
+++ b/pkg/sentry/platform/ring0/x86.go
@@ -54,10 +54,12 @@ const (
        _MSR_LSTAR         = 0xc0000082
        _MSR_CSTAR         = 0xc0000083
        _MSR_SYSCALL_MASK  = 0xc0000084
+       _MSR_K8_SYSCFG     = 0xc0010010
        _MSR_PLATFORM_INFO = 0xce
        _MSR_MISC_FEATURES = 0x140
 
        _PLATFORM_INFO_CPUID_FAULT = 1 << 31
+       _PLATFORM_MSR_K8_SYSCFG_MEM_ENCRYPT = 1 << 23
 
        _MISC_FEATURE_CPUID_TRAP = 0x1
 )
@@ -122,12 +124,35 @@ func VirtualAddressBits() uint32 {
        return (ax >> 8) & 0xff
 }
 
+//
+// On AMD EPYC, if Secure Memory Encryption (SME) is enabled then physical
+// address bit is reduced.
+//
+func ReducedPhysicalAddressBits() uint32 {
+       // Query the max CPUID leaf
+       ax, _, _, _ := cpuid.HostID(0x80000000, 0)
+
+       // If leaf 8000_001F and higher exist that means its processor may
+       // supports AMD SME features.
+       if ax >= 0x8000001F  {
+               // Check whether SME is enabled.
+               if rdmsr(_MSR_K8_SYSCFG) & _PLATFORM_MSR_K8_SYSCFG_MEM_ENCRYPT != 0 {
+                       // If SME is enabled then return the number of bits reduction
+                       // in physical address space.
+                       _, bx, _, _ := cpuid.HostID(0x8000001F, 0)
+                       return (bx >> 6) & 0x3f
+               }
+       }
+
+       return 0
+}
+
 // PhysicalAddressBits returns the number of bits available for physical addresses.
 //
 // FIXME: This should use the cpuid passed to Init.
 func PhysicalAddressBits() uint32 {
        ax, _, _, _ := cpuid.HostID(0x80000008, 0)
-       return ax & 0xff
+       return (ax & 0xff) - ReducedPhysicalAddressBits()
 }
 
 // Selector is a segment Selector.

@codomania
Copy link

copy/paste the patch in reply is messed up, you can see the patch here: https://gist.github.com/codomania/ec7080b82c56d9063bdc3c6ab398c0f4

@codomania
Copy link

You can find more information about the CPUID in AMD APM or kernel doc https://www.kernel.org/doc/Documentation/x86/amd-memory-encryption.txt

@amscanne
Copy link
Contributor

Thanks, that's very helpful. I don't think rdmsr will work in userspace -- do you know how else this information is exposed to a regular userspace process?

I suspect that we may need to shrink the address space regardless of whether it's currently enabled or not. E.g., if the leaf exists then just shrink by (bx >> 6) & 0x3f.

But wait -- does SEV affect the guest physical address size? That's what we care about here. The kernel docs seem to indicate no.

@codomania
Copy link

Right, rdmsr will not work from userspace. So far there was no need for userspace applications to know whether SME is enabled. The SME is transparently handled inside the kernel. userspace does not need to know about it. Having said so, there was some discussion about providing a sysfs for SME which can expose the information like this. Maybe we can take this opportunity to restart the discussion on LKML about providing a sysfs.

IMO, it's safe to shrink the physical address bit regardless of SME state.

Yes, the physical address size reduction is not applicable in SEV guest.

tonistiigi pushed a commit to tonistiigi/gvisor that referenced this issue May 3, 2019
Apparently some platforms don't have pSize < vSize.

Fixes google#208

PiperOrigin-RevId: 245480998
Change-Id: I2a98229912f4ccbfcd8e79dfa355104f14275a9c
Upstream-commit: 5749f64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform: kvm Issue related to the kvm platform priority: p2 Normal priority type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants