-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Dump/restore syscall user dispatch via ptrace #2423
base: criu-dev
Are you sure you want to change the base?
[RFC] Dump/restore syscall user dispatch via ptrace #2423
Conversation
Change made through this commit: - Include copy of flog as a seperate tree. - Modify the makefile to add and compile flog code. Signed-off-by: prakritigoyal19 <[email protected]>
CID 302713 (checkpoint-restore#1 of 1): Missing varargs init or cleanup (VARARGS) va_end was not called for argptr. Signed-off-by: Adrian Reber <[email protected]>
Separate commit for easier criu-dev <-> master transfer. Acked-by: Mike Rapoport <[email protected]> Signed-off-by: Adrian Reber <[email protected]>
It is mapped, not maped. Same applies for mmap I guess. Found by codespell, except it wants to change it to mapped, which will make it less specific. Signed-off-by: Kir Kolyshkin <[email protected]>
Brought to you by codespell -w (using codespell v2.1.0). [v2: use "make indent" on the result] Signed-off-by: Kir Kolyshkin <[email protected]>
Fixes: checkpoint-restore#2121 Signed-off-by: Pengda Yang <[email protected]>
The TOS(type of service) field in the ip header allows you specify the priority of the socket data. Signed-off-by: Suraj Shirvankar <[email protected]>
Signed-off-by: Suraj Shirvankar <[email protected]>
The pipe_size type is unsigned int, when the fcntl call fails and return -1, it will cause a negative rollover problem. Signed-off-by: zhoujie <[email protected]>
Newer Intel CPUs (Sapphire Rapids) have a much larger xsave area than before. Looking at older CPUs I see 2440 bytes. # cpuid -1 -l 0xd -s 0 ... bytes required by XSAVE/XRSTOR area = 0x00000988 (2440) On newer CPUs (Sapphire Rapids) it grows to 11008 bytes. # cpuid -1 -l 0xd -s 0 ... bytes required by XSAVE/XRSTOR area = 0x00002b00 (11008) This increase the xsave area from one page to four pages. Without this patch the fpu03 test fails, with this patch it works again. Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Using the fact that we know criu_pid and criu is a parent of restored process we can create pidfile with pid on caller pidns level. We need to move mount namespace creation to child so that criu-ns can see caller pidns proc. Signed-off-by: Pavel Tikhomirov <[email protected]>
By default, the file name 'amdgpu_plugin.txt' is used also as the name for the corresponding man page (`man amdgpu_plugin`). However, when this man page is installed system-wide it would be more appropriate to have a prefix 'criu-' (e.g., `man criu-amdgpu-plugin`). Signed-off-by: Radostin Stoyanov <[email protected]>
crun wants to set empty_ns and this interface is missing from the library. This adds it to libcriu. Signed-off-by: Adrian Reber <[email protected]>
--criu-binary argument provides a way to supply the CRIU binary location to run_criu(). Related to: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes remove and update the changes introduced in 7177938 in favor of the Python version in CI. os.waitstatus_to_exitcode() function appeared in Python 3.9 Related to: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes add test implementations for criu-ns script. Fixes: checkpoint-restore#1909 Signed-off-by: Dhanuka Warusadura <[email protected]>
These changes fix the `ImportError: No module named pathlib` error when executing criu-ns tests located at criu/test/others/criu-ns Signed-off-by: Dhanuka Warusadura <[email protected]>
CentOS 7 CI environment uses Python 2. To execute criu-ns script in CentOS 7 changing the current shebang line to python is required. This reverse the changes made in a15a63f Signed-off-by: Dhanuka Warusadura <[email protected]>
This is a patch proposed by Thomas here: https://lore.kernel.org/all/87ilczc7d9.ffs@tglx/ It removes (created id > desired id) "sanity" check and adds proper checking that ids start at zero and increment by one each time when we create/delete a posix timer. First purpose of it is to fix infinite looping in create_posix_timers on old pre 3.11 kernels. Second purpose is to allow kernel interface of creating posix timers with desired id change from iterating with predictable next id to just setting next id directly. And at the same time removing predictable next id so that criu with this patch would not get to infinite loop in create_posix_timers if this happens. Thanks a lot to Thomas! Signed-off-by: Pavel Tikhomirov <[email protected]>
This hook allows to start image streamer process from an action script. Signed-off-by: Radostin Stoyanov <[email protected]>
…tions This does cgroup namespace creation separately from joining task cgroups. This makes the code more logical, because creating cgroup namespace also involves joining cgroups but these cgroups can be different to task's cgroups as they are cgroup namespace roots (cgns_prefix), and mixing all of them together may lead to misunderstanding. Another positive thing is that we consolidate !item->parent checks in one place in restore_task_with_children. Signed-off-by: Valeriy Vdovin <[email protected]> Signed-off-by: Pavel Tikhomirov <[email protected]>
4.15-based kernels don't allow F_*SEAL for memfds created with MFD_HUGETLB. Since seals are not possible in this case, fake F_GETSEALS result as if it was queried for a non-sealing-enabled memfd. Signed-off-by: Michał Mirosław <[email protected]>
Linux 4.15 doesn't like empty string for cgroup2 mount options. Pass NULL then to satisfy the kernel check. Log the options for easier debugging. Signed-off-by: Michał Mirosław <[email protected]>
The original commit added saving THP_DISABLED flag value, but missed restoring it. There is restoring code, but used only when --lazy_pages mode is enabled. Restore the prctl flag always. While at it, rename the `has_thp_enabled` -> `!thp_disabled` for consistency. Fixes: bbbd597 (2017-06-28 "mem: add dump state of THP_DISABLED prctl") Signed-off-by: Michał Mirosław <[email protected]>
If prctl(SET_THP_DISABLE) is not used due to bad semantics, log it for easier debugging. Signed-off-by: Michał Mirosław <[email protected]>
While at it, don't carry over stale errno to the fail() message. Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: Michał Mirosław <[email protected]>
Add a sanity check for THP_DISABLE. This discovered a broken commit in Google's kernel tree. Signed-off-by: Michał Mirosław <[email protected]>
During restore, CRIU prints "Enqueue page-read" messages for each page-read request [1]. However, this message does not provide useful information, increases performance overhead during restore and the size of log file. $ ./zdtm.py run -t zdtm/static/maps06 -f h -k always $ grep 'Enqueue page-read' dump/zdtm/static/maps06/56/1/restore.log | wc -l 20493 This commit replaces these log messages with a single message that shows the number of enqueued page-read requests. $ grep 'enqueued' dump/zdtm/static/maps06/56/1/restore.log (00.061449) 56: nr_enqueued: 20493 [1] checkpoint-restore@91388fc Signed-off-by: Radostin Stoyanov <[email protected]>
1) In dump_tcp_conn_state, if return from libsoccr_save is >=0, we check that sizeof(struct libsoccr_sk_data) returned from libsoccr_save is equal to sizeof(struct libsoccr_sk_data) we see in dump_tcp_conn_state (probably to check if we use the right library version). And if sizes are different we go to err_r, which just returns ret, which can teoretically be 0 (if size in library is zero) and that would lead dump_one_tcp treat this as success though it is obvious error. 2) In case of dump_opt or open_image fails we don't explicitly set ret and rely that sizeof(struct libsoccr_sk_data) previously set to ret is not 0, I don't really like it, it makes reading code too complex. 3) We have a lot of err_* labels which do exactly the same thing, there is no point in having all of them, also it is better to choose the name of the label based on what it really does. So let's refactor error handling to avoid these inconsistencies. Signed-off-by: Pavel Tikhomirov <[email protected]>
@svetly-todorov Thank you for the pull request! Would you be able to add tests for this feature? |
|
||
/* Setup SUD-disable struct */ | ||
memset(&disable, 0, sizeof(disable)); | ||
disable.mode = PR_SYS_DISPATCH_OFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10 2.352 compel/src/lib/ptrace.c:54:17: error: ‘PR_SYS_DISPATCH_OFF’ undeclared (first use in this function)
#10 2.352 54 | disable.mode = PR_SYS_DISPATCH_OFF;
#10 2.352 | ^~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a case of kernel headers not always being present. I'll go back and explicitly add the #ifdefs to sud.h.
While I'm at it, I'll probably also add some wrappers around the entire SUD architecture, since this only works with newer versions of the kernel, etc.
criu/sud.c
Outdated
struct sys_dispatch_entry *entry; | ||
sud_config_t config; | ||
|
||
entry = sud_lookup(tid_real, true, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation is also mixed here.
#undef LOG_PREFIX | ||
#define LOG_PREFIX "sys-dispatch: " | ||
|
||
static struct rb_root sud_tid_rb_root = RB_ROOT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add more info in the commit message to describe what is going on here.
CRIU injects a parasite blob while a process is being dumped. How do you handle where to place a parasite blob? |
We don't worry about the parasite blob during dump because we collect and disable SUD during the compel_wait_task that precedes the infection. It's been a while since I wrote this code, but if I'm remembering correctly, collection of SUD entries happens in this callback. This callback triggers here in compel_wait_task. The callback is necessary for SUD, but not necessary for seccomp, because ptrace can disable seccomp without overwriting its settings. So for seccomp, you can SUSPEND_SECCOMP during compel_wait_task and then read its settings later. But for SUD, you can only read its settings while it is active; ptrace only lets you enable/disable it without a graceful suspend. |
Use the PTRACE_GET/SET_SYSCALL_USER_DISPATCH_CONFIG flags to interface with a tracee's SUD settings. This is scaffolding for infect and seize, which will use ptrace to collect and suspend the SUD status of a seized task. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Use the new ptrace wrappers to get the SUD mode of the seized process. If activated, use ptrace_suspend_sud to disable it before proceeding with infection. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Introduce function signatures for SUD image generation, dump, and restore. These mirror the functions in seccomp.h/.c. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Add the necessary protobuf descriptions, image descriptions, and magic number for saving SUD data. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Add hooks for saving SUD status upon infection. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Add hooks for dumping SUD image to disk during cr-dump.c. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Add hooks for restoring SUD settings throughout cr-restore.c. The implementation is a little unorthodox. Unlike seccomp, SUD isn't suspended while a task is under ptrace. So the parasite code cannot restore the SUD settings because SIGSYS may be triggered when the restorer blob is unmapped. Instead, we opt to reopen the per-core data right before PTRACE_DETACH, and restore the SUD settings then. This way we don't risk triggering SIGSYS via a remote syscall. Signed-off-by: Svetly Todorov <[email protected]> Signed-off-by: Gregory Price <[email protected]>
Signed-off-by: Svetly Todorov <[email protected]>
d4b0cc1
to
c4f248f
Compare
I have pushed up a kind of v2 for this RFC containing some fixes and the beginnings of a test case. CRIU itself should build with the new changes. But the test does not work quite yet. I have to drop development on this for now, because of other work obligations. I may circle back to this in the future. Forgive me! |
A friendly reminder that this PR had no activity for 30 days. |
A friendly reminder that this PR had no activity for 30 days. |
Following up on issue #2037.
Use the PTRACE_GET/SET_SYSCALL_USER_DISPATCH flags upstreamed in kernel 6.4 to preserve SUD settings for a target process.
The majority of the implementation mirrors that of the seccomp dump/restore. The most significant difference is that, unlike seccomp, there's no flag for disabling SUD while a target is being ptraced. Therefore restoring it in the parasite blob is dangerous, because remote syscalls can generate a SIGSYS. Instead, restoration happens right before the final PTRACE_DETACH, and not in the parasite code.