Skip to content
This repository has been archived by the owner on Apr 3, 2018. It is now read-only.

[RFC] qemu: set maxmem to host memory size #336

Merged
merged 1 commit into from
Aug 15, 2017
Merged

[RFC] qemu: set maxmem to host memory size #336

merged 1 commit into from
Aug 15, 2017

Conversation

wcwxyz
Copy link
Contributor

@wcwxyz wcwxyz commented Aug 9, 2017

Setting maxmem to host memory size wouldn't change memory footprint. It
only defines the upper bounds for memory hotplug.

Currently we set maxmem to mem*1.5, which is not enough for running a
small VM (eg. 128M) considering nvdimm device consumes memory address
space.

This will also allow us to do memory hotplug (maybe in the future).

Signed-off-by: WANG Chao [email protected]

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 9, 2017

CI failed because I didn't vendor github.com/moby/moby/pkg/system.

Can you guys please show me how to vendor a new go project using dep?

Or should I write a new function (look up /proc/meminfo) to get host memory instead import a new pkg?

@wcwxyz wcwxyz changed the title qemu: set maxmem to host memory size [RFC] qemu: set maxmem to host memory size Aug 9, 2017
@sboeuf
Copy link
Collaborator

sboeuf commented Aug 9, 2017

Hi @wcwxyz, thanks again for this new patch :)
I have just looked at it, and I would prefer that you avoid to import this moby package since what we need is very simple. Moreover, we are not trying to support Windows here, so you can directly create our own function to read from /proc/meminfo. This way, your problems are solved ;)

@coveralls
Copy link

coveralls commented Aug 9, 2017

Coverage Status

Coverage decreased (-0.05%) to 66.449% when pulling 5dbade4 on wcwxyz:qemu-set-maxmem into 59004f2 on containers:master.

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 9, 2017

I've updated the patch. I noticed coverage drop down a bit. That's because some error case is not tested (but should be ok).

@coveralls
Copy link

coveralls commented Aug 9, 2017

Coverage Status

Coverage decreased (-0.05%) to 66.449% when pulling e73fdb3 on wcwxyz:qemu-set-maxmem into 59004f2 on containers:master.

qemu.go Outdated
return ciaoQemu.Memory{}, fmt.Errorf("Unable to read memory info: %s", err)
}
// add 1G memory space for nvdimm device (vm guest image)
memMax := fmt.Sprintf("%dM", int(float64(hostMem)/1024/1024)+1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the spirit of @jodh-intel , it might be nice if the variable names reflected the units they were holding, maybe like memMaxMegs or similar. not a blocker :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm... the code itself is clear enough. Since memMax is a string, I think I'm gonna change hostMem to hostMemBytes to be clearer.

@grahamwhaley
Copy link
Contributor

lgtm
let's add a:
Fixes #337

qemu.go Outdated
return memory, nil
}

func getHostMemorySize() (uint64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this was called getHostMemorySizeBytes() to avoid any confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

qemu.go Outdated
}

func getHostMemorySize() (uint64, error) {
f, err := os.Open("/proc/meminfo")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function really need a test so I'd either pass this filename as an argument, or use a global that the tests can modify. For an almost identical example (feel free to copy):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

qemu.go Outdated
return 0, err
}

return 0, fmt.Errorf("unable to get MemTotal in /proc/meminfo")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this message can reference the variable for /proc/meminfo.

qemu_test.go Outdated
@@ -395,10 +395,16 @@ func TestQemuSetMemoryResources(t *testing.T) {

q := &qemu{}

hostMem, err := getHostMemorySize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/hostMem/hostMemBytes/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks.

qemu_test.go Outdated
if err != nil {
t.Fatal(err)
}
memMax := fmt.Sprintf("%dM", int(float64(hostMem)/1024/1024)+1024)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a paranoid check to ensure that hostMemBytes > 0 before the division. It would also be clearer potentially to define variables for the division + addition with comments explaining why those values were chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. Will do that.

@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage increased (+0.04%) to 66.542% when pulling 524f0d3 on wcwxyz:qemu-set-maxmem into 59004f2 on containers:master.

@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage increased (+0.04%) to 66.542% when pulling ba7761a on wcwxyz:qemu-set-maxmem into 59004f2 on containers:master.

Copy link
Collaborator

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I'd like @grahamwhaley to review as well.

qemu_test.go Outdated
file := filepath.Join(dir, "meminfo")
// file doesn't exist
if _, err := getHostMemorySizeBytes(file); err == nil {
t.Fatal()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could remove the comment above and make this:

t.Fatalf("expected failure as file %q does not exist", file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sure.

qemu.go Outdated
@@ -106,6 +108,11 @@ const (
maxDevIDSize = 31
)

const (
// NVDIMM device needs memory space (1024MB is suggested by @grahamwhaley)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the comment but please can you add a link to an issue/PR rather than mentioning individuals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the full url, github seems automatically format that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks correct to me :-) (yes, github nicely hides the full URL where it can :-) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jodh-intel, you should not mention @grahamwhaley directly into the source code. And please refer only the isssue, not the very specific comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 10, 2017

@jodh-intel thanks for review. I'll update the code after @grahamwhaley's review.

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 10, 2017

I take @grahamwhaley is ok with the rest of the patch.
Re-pushed. Please take a look.

@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage increased (+0.04%) to 66.542% when pulling 23748cc on wcwxyz:qemu-set-maxmem into 59004f2 on containers:master.

@grahamwhaley
Copy link
Contributor

thanks @wcwxyz that:
lgtm
@sboeuf can you review, and if happy, merge?

@sboeuf
Copy link
Collaborator

sboeuf commented Aug 11, 2017

I'll review tomorrow, please don't merge until I have reviewed this one.

@grahamwhaley
Copy link
Contributor

adding DNM label until @sboeuf gets a chance to review etc.

qemu.go Outdated
@@ -106,6 +108,11 @@ const (
maxDevIDSize = 31
)

const (
// NVDIMM device needs memory space (1024MB is suggested by @grahamwhaley)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jodh-intel, you should not mention @grahamwhaley directly into the source code. And please refer only the isssue, not the very specific comment.

qemu.go Outdated
const (
// NVDIMM device needs memory space (1024MB is suggested by @grahamwhaley)
// See https://github.com/clearcontainers/runtime/issues/380#issuecomment-321216520
defaultNvdimmMemory = 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this constant maxMemoryOffset because that's basically what it is, and we don't need to specify the reason in the variable name. As long as we have your comment, a generic name sounds better to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

qemu.go Outdated
@@ -468,13 +476,20 @@ func (q *qemu) setCPUResources(podConfig PodConfig) ciaoQemu.SMP {
return smp
}

func (q *qemu) setMemoryResources(podConfig PodConfig) ciaoQemu.Memory {
func (q *qemu) setMemoryResources(podConfig PodConfig) (ciaoQemu.Memory, error) {
hostMemBytes, err := getHostMemorySizeBytes("/proc/meminfo")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/proc/meminfo is a very well known path for Linux. Please define this as a const and don't provide it through this call, instead use it directly from the function getHostMemorySizeBytes().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

qemu.go Outdated
}

// add 1G memory space for nvdimm device (vm guest image)
memMax := fmt.Sprintf("%dM", int(float64(hostMemBytes)/1024/1024)+defaultNvdimmMemory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong to do that, isn't it ? I mean you're clearly asking to run a VM with a memory max greater than what the system has to offer ? Or am I reading the code wrong ?
Instead you should check that the system can allow at least the expected memory to be set and this would be a nice error to throw. IMO, we don't want to continue if our host cannot provide the amount of memory requested, because the user could end up in not being able to run the workload he expects.

And I think that you should validate that the expected amount of memory + the offset can fit in the host memory. But please don't consider this case as throwing an error, just log a warning, because the offset might be too much in some cases that could end up functioning very well. Indeed, in case you have the NVDIMM using 200MiB and you ask 1.5GiB on a 2GiB host, you really need only 1.7 and this will work great. But if we check with our offset of 1GiB, we re gonna fail, that's why we don't want to generate an error in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. maxmem is used to define upper bounds for memory hotplug. You request memory via qemu -m MEMORY.

Say run qemu -m 2048M,slots=2,maxmem=30000M, you got 2048M for on your VM and you can hot plug in 2 rom device (pc-dimm) into VM, the total memory can not exceed 30000M. In case your host memory is less than 30000M, hot plug will fail when your ask for memory when host doesn't have available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what you're saying, in case the host memory is less than 30G, the hotplug will fail. That's why I think we should try to set a maxmem higher than the total RAM available on the host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I understand what you're trying to say.

You're wrong about "offset". It is not consuming real host memory. Nvdimm needs "offset" to work on qemu. It takes memory space, not the real memory. (I'm not familiar with nvdimm device, but a quick test suggests that).

For maxmem, this is just the memory upper bounds for qemu. Setting maxmem to host_total_memory+offset is not asking qemu to run a VM with that much memory. Qemu will accept or reject to proceed any memory hotplug requests based on the maxmem. Whether hot plug works or not, depends kernel memory management.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 'offset', allocating that extra space and the extra slot does not take any host memory at startup. Then later we plug the NVDIMM into that space. I think we are probably safe not calculating that into the 'total host memory' equation.

For 'mem_request > host_memory' - whether the host allows or denies a VM (CC-container) to be created if it requests more memory than the host has either available or system total I believe actually comes down to how the host kernel is configured for over-commit. If the system owner has knowingly configured the system to allow VM overcommit then we should probably not error. I think we could be more informative though with one or both of the following options maybe:

  • warn if the requested memory is > the current system free memory (the default kernel non-overcommit failure case I believe)
  • warn if the requested memory is > the system memory total (because you are likely going to run into performance issues if your container uses all that memory)

For more information on kernel overcommit settings etc. see:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/Documentation/sysctl/vm.txt
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/Documentation/vm/overcommit-accounting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @grahamwhaley for elaborating on memory overcommit.

I don't think we should do anything about user requested memory. Runtime should not log error/warn solely based on overcommit or free memory. When QEMU fails to allocate memory, it errors out. There's enough information for user to see.

Kernel memory management is tricky. I really don't think we should make assumptions about memory is not enough for VM under any condition. Because we can only know memory actually runs out when there's allocation failure at the moment.

@grahamwhaley @sboeuf What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with that theory - we cannot tell when a VM launch will fail or not, as we don't have the intimate knowledge of the system.
Later if we find that the qemu failure is non-obvious from the logs, then we can maybe work on improving how the error/logs are gathered and pushed up through the runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I believe this part of code is ok, if @sboeuf doesn't have any objections.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok you convinced me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review @sboeuf @grahamwhaley. I'll post v2.

qemu.go Outdated
if err != nil {
continue
}
bytes := uint64(size) * 1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we converting to bytes ? I mean we're basically multiplying by 1024 here to divide twice from the caller. I would not bother doing this and instead name the function getHostMemorySizeKb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to KB.

@coveralls
Copy link

coveralls commented Aug 15, 2017

Coverage Status

Coverage increased (+0.02%) to 66.527% when pulling d77b1cf on wcwxyz:qemu-set-maxmem into 58ebe95 on containers:master.

Copy link
Collaborator

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this v2, the code itself looks good to me, but I would like to see those generic functions/variables as part of hypervisor.go so that they could be reused for other hypervisor implementations in the future.
I am looking forward to the v3 ;)

qemu.go Outdated
maxMemoryOffset = 1024
)

const procMemInfo = "/proc/meminfo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to hypervisor.go since it can be reused across different hypervisor implementations.

qemu.go Outdated
return memory, nil
}

func getHostMemorySizeKb(memInfoPath string) (uint64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to hypervisor.go since it can be reused across different hypervisor implementations.

Setting maxmem to host memory size wouldn't change memory footprint. It
only defines the upper bounds for memory hotplug.

Currently we set maxmem to mem*1.5, which is not enough for running a
small VM (eg. 128M) considering nvdimm device consumes memory address
space.

This will also allow us to do memory hotplug (maybe in the future).

Fixes #337

Signed-off-by: WANG Chao <[email protected]>
@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 15, 2017

Here comes v3. Thanks.

@coveralls
Copy link

coveralls commented Aug 15, 2017

Coverage Status

Coverage increased (+0.02%) to 66.527% when pulling d0bb122 on wcwxyz:qemu-set-maxmem into 58ebe95 on containers:master.

@sboeuf sboeuf merged commit af35aeb into containers:master Aug 15, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants