Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct: wrap more unix errors #3270

Merged
merged 1 commit into from
Nov 14, 2021
Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Nov 11, 2021

When I tried to start a rootless container under a different/wrong user,
I got:

$ ../runc/runc --systemd-cgroup --root /tmp/runc.$$ run 445
ERRO[0000] runc run failed: operation not permitted

This is obviously not good enough. With this commit, the error is:

ERRO[0000] runc run failed: fchown fd 9: operation not permitted

Alas, there are still some code that returns unwrapped errnos from
various unix calls.

This is a followup to commit d8ba412 (part of PR #3011) which wrapped
many, but not all, bare unix errors. Do wrap some more, using either os.PathError
or os.SyscallError.

While at it,

  • use os.SyscallError instead of os.NewSyscallError;
  • use errors.Is(err, os.ErrXxx) instead of os.IsXxx(err).

Signed-off-by: Kir Kolyshkin [email protected]

AkihiroSuda
AkihiroSuda previously approved these changes Nov 11, 2021
@kolyshkin
Copy link
Contributor Author

Rebased on git tip to fix 🔴 CI

When I tried to start a rootless container under a different/wrong user,
I got:

	$ ../runc/runc --systemd-cgroup --root /tmp/runc.$$ run 445
	ERRO[0000] runc run failed: operation not permitted

This is obviously not good enough. With this commit, the error is:

	ERRO[0000] runc run failed: fchown fd 9: operation not permitted

Alas, there are still some code that returns unwrapped errnos from
various unix calls.

This is a followup to commit d8ba412 which wrapped many, but not
all, bare unix errors. Do wrap some more, using either os.PathError or
os.SyscallError.

While at it,
 - use os.SyscallError instead of os.NewSyscallError;
 - use errors.Is(err, os.ErrXxx) instead of os.IsXxx(err).

Signed-off-by: Kir Kolyshkin <[email protected]>
@thaJeztah
Copy link
Member

integration_systemd is failing 😞 ; not immediately obvious why (but perhaps the issue hasn't been fixed?) https://github.com/opencontainers/runc/pull/3270/checks?check_run_id=4187551952

ok 159 userns with inaccessible mount + exec
ok 160 runc version
make: *** [Makefile:90: localintegration] Error 1
make: Leaving directory '/vagrant'
Connection to 192.168.121.38 closed.

Exit status: 2

@AkihiroSuda
Copy link
Member

integration_systemd is failing 😞 ; not immediately obvious why (but perhaps the issue hasn't been fixed?)

Probably unrelated

not ok 24 checkpoint --lazy-pages and restore
# (from function `fail' in file tests/integration/helpers.bash, line 338,
#  from function `runc_restore_with_pipes' in file tests/integration/checkpoint.bats, line 94,
#  in test file tests/integration/checkpoint.bats, line 256)
#   `runc_restore_with_pipes ./image-dir test_busybox_restore --lazy-pages' failed
# runc spec (status=0):
# 
# uffd-noncoop is supported
# runc state test_busybox (status=0):
# {
#   "ociVersion": "1.0.2-dev",
#   "id": "test_busybox",
#   "pid": 26121,
#   "status": "running",
#   "bundle": "/tmp/bats-run-5wCTTi/runc.g8U5xW/bundle",
#   "rootfs": "/tmp/bats-run-5wCTTi/runc.g8U5xW/bundle/rootfs",
#   "created": "2021-11-12T08:51:09.980655157Z",
#   "owner": ""
# }
# __runc restore test_busybox_restore failed (status: 1)
# time="2021-11-12T08:51:10Z" level=error msg="criu failed: type NOTIFY errno 0\nlog file: image-dir/restore.log"
# CRIU log errors (if any):
# ./image-dir/restore.log-(00.058769) 26191 (native) is going to execute the syscall 11, required is 15
# ./image-dir/restore.log-(00.058809) 26191 was trapped
# ./image-dir/restore.log-(00.058812) `- Expecting exit
# ./image-dir/restore.log-(00.058853) 26191 was trapped
# ./image-dir/restore.log-(00.058857) 26191 (native) is going to execute the syscall 18446744073709551615, required is 15
# ./image-dir/restore.log:(00.058898) Error (compel/src/lib/infect.c:1543): Task 26191 is in unexpected state: f7f
# ./image-dir/restore.log:(00.058909) Error (compel/src/lib/infect.c:1549): Task stopped with 15: Terminated
# ./image-dir/restore.log:(00.058911) Error (criu/cr-restore.c:2383): Can't stop all tasks on rt_sigreturn
# ./image-dir/restore.log:(00.058912) Error (criu/cr-restore.c:2420): Killing processes because of failure on restore.
# ./image-dir/restore.log-The Network was unlocked so some data or a connection may have been lost.
# ./image-dir/restore.log:(00.062714) Error (criu/mount.c:3385): mnt: Can't remove the directory /tmp/.criu.mntns.4PLAEH: No such file or directory
# ./image-dir/restore.log:(00.062720) Error (criu/cr-restore.c:2447): Restoring FAILED.
# --
# ./image-dir/restore.log-(00.058769) 26191 (native) is going to execute the syscall 11, required is 15
# ./image-dir/restore.log-(00.058809) 26191 was trapped
# ./image-dir/restore.log-(00.058812) `- Expecting exit
# ./image-dir/restore.log-(00.058853) 26191 was trapped
# ./image-dir/restore.log-(00.058857) 26191 (native) is going to execute the syscall 18446744073709551615, required is 15
# ./image-dir/restore.log:(00.058898) Error (compel/src/lib/infect.c:1543): Task 26191 is in unexpected state: f7f
# ./image-dir/restore.log:(00.058909) Error (compel/src/lib/infect.c:1549): Task stopped with 15: Terminated
# ./image-dir/restore.log:(00.058911) Error (criu/cr-restore.c:2383): Can't stop all tasks on rt_sigreturn
# ./image-dir/restore.log:(00.058912) Error (criu/cr-restore.c:2420): Killing processes because of failure on restore.
# ./image-dir/restore.log-The Network was unlocked so some data or a connection may have been lost.
# ./image-dir/restore.log:(00.062714) Error (criu/mount.c:3385): mnt: Can't remove the directory /tmp/.criu.mntns.4PLAEH: No such file or directory
# ./image-dir/restore.log:(00.062720) Error (criu/cr-restore.c:2447): Restoring FAILED.
# runc restore failed

https://cirrus-ci.com/task/6279543066460160

@thaJeztah
Copy link
Member

I tried restarting, but for some reason that doesn't appear to work (clicked "re-run job" as well as "re-run failed jobs" from the dropdown 🤔

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Nov 12, 2021

I tried restarting, but for some reason that doesn't appear to work (clicked "re-run job" as well as "re-run failed jobs" from the dropdown 🤔

Cirrus CI is actually working (https://cirrus-ci.com/task/4825507825975296), but the result won't be synced back to GitHub web UI until it completes

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LGTM" (❤️ the improved errors); but left a suggestion / comment

@@ -412,7 +413,7 @@ func fixStdioPermissions(config *initConfig, u *user.ExecUser) error {
} {
var s unix.Stat_t
if err := unix.Fstat(int(fd), &s); err != nil {
return err
return &os.PathError{Op: "fstat", Path: "fd " + strconv.Itoa(int(fd)), Err: err}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably mostly a nit, but wondering, given that fd is a uintptr, if FormatUint() would be a closer fit?

Suggested change
return &os.PathError{Op: "fstat", Path: "fd " + strconv.Itoa(int(fd)), Err: err}
return &os.PathError{Op: "fstat", Path: "fd " + strconv.FormatUint(uint64(fd), 10), Err: err}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I should propose a strconv.UItoa() function in Golang 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably mostly a nit, but wondering, given that fd is a uintptr, if FormatUint() would be a closer fit?

It still needs a typecast, and adds additional argument (10). I like Atoi because it's a tad shorter.

The difference between FormatInt and FormatUint is latter do not check if the argument is negative.

All in all, it's all the same here, I don't see how one way is better than the other.

I'm even fine with fmt.Sprintf("fd %d", fs) -- it is slower but we don't care since it's an error path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like Atoi because it's a tad shorter.

Yes, I wish there was a strconv.UItoa()

All in all, it's all the same here, I don't see how one way is better than the other.

It was mostly a nit; wondering if gosec will complain about it once we enabled it 🤔

@kolyshkin
Copy link
Contributor Author

not ok 24 checkpoint --lazy-pages and restore

This is something I'm tackling for quite a long time (#2760, #2924). Seems to be related to systemd.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 7c219d8 into opencontainers:master Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants