Reduce the overhead of shutting down the Firecracker process #1164

wkozaczuk · 2019-07-09T05:00:32Z

Based on some casual measurements it takes 40-60ms from the moment exit() is called to the moment Linux host considers Firecracker dead. For more details please see this and that comment from somewhat related issue.

renzhengeek · 2020-01-03T03:08:02Z

Hi @wkozaczuk

First, I feel sorry for asking out-scoped question here, but your post comparing boot-time between qemu microvm and firecracker really interests me and guide me here.

I'm also interested in evaluating their boot/startup time in a certain fair way for both. What I can come up as below:

use same rootfs image
user same kernel
build a minimum configured qemu, and default build firecracker

However, the specific problems below occur, say if I want to measure the time to run a hello word binary:

I saw in your post the test runs with OSv, could you please share a test version without OSv? so that people like without OSv knowledge can easily reproduce the measurement:-)
no matter booting with qemu or firecracker, it always interacts to prompt to login, only after login, can I run the binary; Is there a way to run the test in batch mode so that we can easily get the overall time?

I see you gave very detailed time breakdown in other place, very good. I would be much appreciated if you can shed some experiences :-)

serban300 · 2020-09-22T15:32:37Z

I've been experimenting a bit with this issue, trying to add unsafe {libc::_exit(0)}; at different places during the vm startup process in order to compare the shutting down overhead.

Adding unsafe {libc::_exit(0)}; before setup_interrupt_controller(&mut vm)?; leads to almost no overhead:

17:41:43.955618 exit_group(0)           = ?
17:41:43.958232 +++ exited with 0 +++

Adding unsafe {libc::_exit(0)}; after setup_interrupt_controller(&mut vm)?; leads to an overhead of about 20ms:

17:48:01.881943 exit_group(0)           = ?
17:48:01.900969 +++ exited with 0 +++

Next, injecting unsafe {libc::_exit(0)}; after

create_pio_dev_manager_with_legacy_devices(&vm, serial_device, reset_evt)
            .map_err(Internal)?

Leads to a 30ms overhead:

17:58:44.942634 exit_group(0)           = ?
17:58:44.976892 +++ exited with 0 +++

I guess it has something to do with the cost of destroying these objects. But we would have to dive deeper.

serban300 · 2021-01-11T14:57:40Z

Hi @wkozaczuk ,

I did a bit more research on this.

As far as I understand, indeed the process counts as fully destroyed only when the parent fetches the exit status via wait4 here. But this seems correct, since from what I understand wait4 returns only after the kvm_destroy_vm() method has finished execution. This method cleans up the old VM.

As for the added latency, I could track a big chunk of it (about 20ms) here:

kvm_destroy_vm()
-> kvm_arch_sync_events(kvm)
-> kvm_free_pit(kvm)
-> kvm_pit_set_reinject(pit, false)
-> kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier)
-> synchronize_srcu(&kvm->irq_srcu)

As far as I know, this is the expected RCU behavior and synchronize_srcu() ends up waiting for a full grace period. We stumble upon a similar issue here.

I will close this issue since it doesn't seem related to Firecracker. Feel free to reopen it if needed.

Nugine · 2021-03-15T11:51:02Z

I ran into the same problem elsewhere. Here is what I have found. Maybe it will help you.

The program below spawns a process in a new IPC namespace. Then the new process replaces itself with /bin/ls.

argv[1] controls whether to create a new IPC namespace.

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/utsname.h>
#include <sys/wait.h>
#include <unistd.h>

#define err_exit(msg)                                                          \
    do {                                                                       \
        perror(msg);                                                           \
        exit(EXIT_FAILURE);                                                    \
    } while (0)

static int child_fn(void *arg) {
    char *const argv[] = {"/bin/ls", NULL};
    char *const env[] = {NULL};
    execve(argv[0], argv, env);
    return 0;
}

#define STACK_SIZE (1024 * 1024)

int main(int argc, char *argv[]) {
    int new_ipc = argc == 2 && strcmp(argv[1], "NEWIPC") == 0;

    char *stack;
    char *stack_top;
    pid_t pid;

    stack = malloc(STACK_SIZE);
    stack_top = stack + STACK_SIZE;

    int flags = SIGCHLD;

    if (new_ipc) {
        flags |= CLONE_NEWIPC;
    }

    pid = clone(child_fn, stack_top, flags, argv[1]);
    printf("clone() returned %ld\n", (long)pid);

    waitpid(pid, NULL, 0);
    printf("child has terminated\n");

    return 0;
}

root@Nugine-Ub2:~/tests/ipc_ns_delay# ls
ipc.c

root@Nugine-Ub2:~/tests/ipc_ns_delay# gcc ipc.c

Execute ./a.out with strace.

root@Nugine-Ub2:~/tests/ipc_ns_delay# strace -tt -f -e wait4,execve,exit_group,exit,clone ./a.out
19:32:58.555131 execve("./a.out", ["./a.out"], 0x7ffdf36ef710 /* 42 vars */) = 0
19:32:58.558474 clone(child_stack=0x7f4099218000, flags=SIGCHLD) = 30483
strace: Process 30483 attached
clone() returned 30483
[pid 30483] 19:32:58.558900 execve("/bin/ls", ["/bin/ls"], 0x7f4099217fd8 /* 0 vars */ <unfinished ...>
[pid 30482] 19:32:58.559053 wait4(30483,  <unfinished ...>
[pid 30483] 19:32:58.559418 <... execve resumed> ) = 0
a.out  ipc.c
[pid 30483] 19:32:58.567093 exit_group(0) = ?
[pid 30483] 19:32:58.567568 +++ exited with 0 +++
19:32:58.567625 <... wait4 resumed> NULL, 0, NULL) = 30483
19:32:58.567693 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30483, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
child has terminated
19:32:58.567931 exit_group(0)           = ?
19:32:58.568292 +++ exited with 0 +++

[pid 30483] The duration between exit_group and exited is 567658-567093=475 us

Execute ./a.out NEWIPC with strace.

root@Nugine-Ub2:~/tests/ipc_ns_delay# strace -tt -f -e wait4,execve,exit_group,exit,clone ./a.out NEWIPC
19:33:01.580376 execve("./a.out", ["./a.out", "NEWIPC"], 0x7ffcba451678 /* 42 vars */) = 0
19:33:01.583591 clone(strace: Process 30487 attached
child_stack=0x7f230c3d6000, flags=CLONE_NEWIPC|SIGCHLD) = 30487
[pid 30487] 19:33:01.584165 execve("/bin/ls", ["/bin/ls"], 0x7f230c3d5fd8 /* 0 vars */clone() returned 30487
 <unfinished ...>
[pid 30486] 19:33:01.584685 wait4(30487,  <unfinished ...>
[pid 30487] 19:33:01.584801 <... execve resumed> ) = 0
a.out  ipc.c
[pid 30487] 19:33:01.592219 exit_group(0) = ?
[pid 30487] 19:33:01.617658 +++ exited with 0 +++
19:33:01.617759 <... wait4 resumed> NULL, 0, NULL) = 30487
19:33:01.617898 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30487, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
child has terminated
19:33:01.618275 exit_group(0)           = ?
19:33:01.618712 +++ exited with 0 +++
root@Nugine-Ub2:~/tests/ipc_ns_delay#

[pid 30487] The duration between exit_group and exited is 617658-592219=25439 us

It seems that the IPC namespace causes the overhead(about 25ms) of shutting down.

Is it a kernel bug?

andreeaflorescu added the Type: Bug Indicates an unexpected problem or unintended behavior label Jul 15, 2019

alexandruag added the Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` label Jul 15, 2019

dianpopa added the Performance: Misc label Oct 17, 2019

serban300 closed this as completed Jan 11, 2021

serban300 self-assigned this Jan 11, 2021

wkozaczuk mentioned this issue Apr 28, 2023

Run OSv without the overhead of Firecracker cloudius-systems/osv#1228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the overhead of shutting down the Firecracker process #1164

Reduce the overhead of shutting down the Firecracker process #1164

wkozaczuk commented Jul 9, 2019

renzhengeek commented Jan 3, 2020

serban300 commented Sep 22, 2020

serban300 commented Jan 11, 2021

Nugine commented Mar 15, 2021 •

edited

Loading

Reduce the overhead of shutting down the Firecracker process #1164

Reduce the overhead of shutting down the Firecracker process #1164

Comments

wkozaczuk commented Jul 9, 2019

renzhengeek commented Jan 3, 2020

serban300 commented Sep 22, 2020

serban300 commented Jan 11, 2021

Nugine commented Mar 15, 2021 • edited Loading

Nugine commented Mar 15, 2021 •

edited

Loading