-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[centos7] not ok 15 checkpoint --lazy-pages and restore #2760
Comments
TODO: make it run multiple times, show full log, cross my fingers and hope it will fail. |
Got another failure ( https://github.com/opencontainers/runc/pull/2087/checks?check_run_id=1841731496), this time with slightly different output:
|
This comment has been minimized.
This comment has been minimized.
It is now failing with
|
Another failure in #2768, same diagnostics, for details see https://github.com/opencontainers/runc/pull/2768/checks?check_run_id=1975173190 |
Another failure, this time from #2835. From https://github.com/opencontainers/runc/pull/2835/checks?check_run_id=2060556750:
|
OK, we got a full log now The error comes from this code: runc/tests/integration/checkpoint.bats Lines 157 to 191 in 249bca0
The log (from https://github.com/opencontainers/runc/pull/2852/checks?check_run_id=2122290178) is here:
Alas it's still unclear what is going on :( //cc @adrianreber @avagin |
One more failure, from this run: https://github.com/opencontainers/runc/pull/2836/checks?check_run_id=2299143195
At this point I am inclined to disable this test for CentOS 7 :-( |
I agree. As far as I know there are no users of runc's lazy migration support in any higher levels and CentOS 7 is now really old. If we would get CRIU bugs on CentOS 7 I would also recommend to upgrade to something newer. |
When doing a lazy checkpoint/restore, we should not restore into the same cgroup, otherwise there is a race which result in occasional killing of the restored container (GH opencontainers#2760, opencontainers#2924). The fix is to use --manage-cgroup-mode=ignore, which allows to restore into a different cgroup. Note that since cgroupsPath is not set in config.json, the cgroup is derived from the container name, so calling set_cgroups_path is not needed. For the previous (unsuccessful) attempt to fix this, as well as detailed (and apparently correct) analysis, see commit 36fe3cc. Signed-off-by: Kir Kolyshkin <[email protected]>
When doing a lazy checkpoint/restore, we should not restore into the same cgroup, otherwise there is a race which result in occasional killing of the restored container (GH opencontainers#2760, opencontainers#2924). The fix is to use --manage-cgroup-mode=ignore, which allows to restore into a different cgroup. Note that since cgroupsPath is not set in config.json, the cgroup is derived from the container name, so calling set_cgroups_path is not needed. For the previous (unsuccessful) attempt to fix this, as well as detailed (and apparently correct) analysis, see commit 36fe3cc. Signed-off-by: Kir Kolyshkin <[email protected]>
When doing a lazy checkpoint/restore, we should not restore into the same cgroup, otherwise there is a race which result in occasional killing of the restored container (GH opencontainers#2760, opencontainers#2924). The fix is to use --manage-cgroup-mode=ignore, which allows to restore into a different cgroup. Note that since cgroupsPath is not set in config.json, the cgroup is derived from the container name, so calling set_cgroups_path is not needed. For the previous (unsuccessful) attempt to fix this, as well as detailed (and apparently correct) analysis, see commit 36fe3cc. Signed-off-by: Kir Kolyshkin <[email protected]>
When doing a lazy checkpoint/restore, we should not restore into the same cgroup, otherwise there is a race which result in occasional killing of the restored container (GH opencontainers#2760, opencontainers#2924). The fix is to use --manage-cgroup-mode=ignore, which allows to restore into a different cgroup. Note that since cgroupsPath is not set in config.json, the cgroup is derived from the container name, so calling set_cgroups_path is not needed. For the previous (unsuccessful) attempt to fix this, as well as detailed (and apparently correct) analysis, see commit 36fe3cc. Signed-off-by: Kir Kolyshkin <[email protected]>
When doing a lazy checkpoint/restore, we should not restore into the same cgroup, otherwise there is a race which result in occasional killing of the restored container (GH opencontainers#2760, opencontainers#2924). The fix is to use --manage-cgroup-mode=ignore, which allows to restore into a different cgroup. Note that since cgroupsPath is not set in config.json, the cgroup is derived from the container name, so calling set_cgroups_path is not needed. For the previous (unsuccessful) attempt to fix this, as well as detailed (and apparently correct) analysis, see commit 36fe3cc. Signed-off-by: Kir Kolyshkin <[email protected]>
After accidentally enabling checkpoint/restore tests on CentOS 7 (in a15ebe5, PR #2757), I found out thatlazy-pages test sometimes fails without providing any decent error:
The command that fails is:
The text was updated successfully, but these errors were encountered: