-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Add O_APPEND
emulation for single process
#1935
base: master
Are you sure you want to change the base?
Conversation
Note that Gramine doesn't support multi-process operations on the same file anyhow, in particular, the file position is not synchronized among multiple processes. This commit doesn't change this limitation of Gramine, thus `O_APPEND` writes don't work correctly (may overwrite file contents). Note that Linux has a bug in `pwrite()`. Gramine emulates this bug in the same way. A new FS test is added. One LTP test is uncommented. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
a2a9b6b
to
4950661
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
a discussion (no related file):
FYI: implementation of O_APPEND
in the Linux kernel is very simple and self-contained. I searched for APPEND
keyword and these are the main sources:
- https://elixir.bootlin.com/linux/v6.9/source/fs/fcntl.c
- https://elixir.bootlin.com/linux/v6.9/source/fs/namei.c
- https://elixir.bootlin.com/linux/v6.9/source/fs/open.c
- https://elixir.bootlin.com/linux/v6.9/source/fs/read_write.c
I hope I covered all Linux logic on O_APPEND
in this PR. (Note that some logic was already implemented in Gramine.)
libos/src/fs/chroot/encrypted.c
line 480 at r2 (raw file):
lock(&hdl->inode->lock); file_off_t actual_pos = (hdl->flags & O_APPEND) ? hdl->inode->size : *pos;
I admit that I don't understand the locking in these handlers... Technically, O_APPEND
flag may be changed at runtime via fcntl(SET_FL)
, so there must be smth like lock(&hdl->lock)
. At the same time, I don't observe such locking in other (similar) places, and it's also confusing what needs to be locked where.
So for now I do not perform additional locking. I even wonder if hdl->flags
can be considered thread-safe for well-behaved apps, because if some app changes the flags in one thread and e.g. writes in another thread, that's a data race and a bug in the app.
libos/src/sys/libos_file.c
line 445 at r2 (raw file):
} if (!(in_hdl->acc_mode & MAY_READ) || !(out_hdl->acc_mode & MAY_WRITE)
FYI: this change is orthogonal to this PR. Surprisingly, there was already handling of O_APPEND
in this sendfile emulation, though it returned an incorrect error code. So I fixed this and also added two other checks. This could be moved out to a separate PR if needed.
libos/test/fs/test_fs.py
line 136 at r2 (raw file):
self.assertIn('TEST 1 (' + file_path + ')', stdout) self.assertIn('TEST 2 (' + file_path + ')', stdout) self.assertIn('TEST 3 (' + file_path + ')', stdout)
FYI: Other FS tests also do things like self.assertIn(close(...))
, but there are so many lines in the log, that I found it stupid to check each line. So just checking that sub-tests are at least started and printed out. And of course the whole test must run to successful completion.
Jenkins, test this please (Jenkins was dead when this PR was created) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/src/sys/libos_file.c
line 445 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
FYI: this change is orthogonal to this PR. Surprisingly, there was already handling of
O_APPEND
in this sendfile emulation, though it returned an incorrect error code. So I fixed this and also added two other checks. This could be moved out to a separate PR if needed.
According to the sendfile man page, it appears that the original error code -EINVAL is intended to be returned when out_hdl has the O_APPEND flag set.
"EINVAL out_fd has the O_APPEND flag set. This is not currently
supported by sendfile()."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r1, 15 of 17 files at r2, all commit messages.
Reviewable status: 16 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @efu39)
libos/src/sys/libos_file.c
line 445 at r2 (raw file):
Previously, efu39 (Erica Fu) wrote…
According to the sendfile man page, it appears that the original error code -EINVAL is intended to be returned when out_hdl has the O_APPEND flag set.
"EINVAL out_fd has the O_APPEND flag set. This is not currently
supported by sendfile()."
+1
Why change the previous check on outfd w/ O_APPEND
to return -EBADF
(which should return -EINVAL
)?
libos/src/sys/libos_open.c
line 335 at r2 (raw file):
/* * Note that Linux has the following bug: POSIX requires that opening a file with the O_APPEND * flag should have no affect on the location at which pwrite() writes data. However, on Linux,
-> effect
Code quote:
affect
libos/test/fs/read_append.c
line 36 at r2 (raw file):
printf("close(%s) OK\n", file_path); if (file_size(file_path) != size) fatal_error("File size is wrong (expected sizeof(buf1))\n");
This looks a bit weird -- shouldn't we print the expected and got bytes?
Code quote:
expected sizeof(buf1)
libos/test/fs/read_append.c
line 73 at r2 (raw file):
printf("close(%s) OK\n", file_path); if (file_size(file_path) != size * 2) fatal_error("File size is wrong (expected sizeof(buf1) * 2)\n");
ditto
Code quote:
expected sizeof(buf1) * 2
libos/test/fs/read_append.c
line 105 at r2 (raw file):
printf("close(%s) OK\n", file_path); if (file_size(file_path) != size * 3) fatal_error("File size is wrong (expected sizeof(buf1) * 3)\n");
ditto
Code quote:
expected sizeof(buf1) * 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't change this limitation of Gramine, thus
O_APPEND
writes don't work correctly (may overwrite file contents).
I think we want to log some warning when e.g., child process tries to write an O_APPEND
fd created by parent process?
Reviewable status: 16 of 18 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv and @efu39)
Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 13 of 18 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)
a discussion (no related file):
@kailun-qin wrote:
I think we want to log some warning when e.g., child process tries to write an
O_APPEND
fd created by parent process?
Yes, for some reason I thought I implemented that... But now I see that I didn't add this logic. Unfortunately, this is not that trivial to add here, because I'm not sure to which file types it should be applied, and whether we should keep track of the offset (like, the file was opened in APPEND mode but was never written to, and then fork()
happens, in this case the child would be the only process that uses this file and it's actually safe).
So I decided not to implement it in this PR, as we won't have time to finalize this before I'm gone.
(For info, we mean the trick like used in secure eventfd: 51e99f9)
libos/src/sys/libos_file.c
line 445 at r2 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
+1
Why change the previous check on outfd w/
O_APPEND
to return-EBADF
(which should return-EINVAL
)?
Interesting, you're right after some quick testing on Linux.
I was confused by this Linux code: https://elixir.bootlin.com/linux/v6.11.5/source/fs/read_write.c#L1735. But apparently this code path is not used in sendfile()
.
Done.
libos/src/sys/libos_open.c
line 335 at r2 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
-> effect
Done.
libos/test/fs/read_append.c
line 36 at r2 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
This looks a bit weird -- shouldn't we print the expected and got bytes?
Done.
libos/test/fs/read_append.c
line 73 at r2 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
ditto
Done.
libos/test/fs/read_append.c
line 105 at r2 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
ditto
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: 16 of 18 files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@kailun-qin wrote:
I think we want to log some warning when e.g., child process tries to write an
O_APPEND
fd created by parent process?Yes, for some reason I thought I implemented that... But now I see that I didn't add this logic. Unfortunately, this is not that trivial to add here, because I'm not sure to which file types it should be applied, and whether we should keep track of the offset (like, the file was opened in APPEND mode but was never written to, and then
fork()
happens, in this case the child would be the only process that uses this file and it's actually safe).So I decided not to implement it in this PR, as we won't have time to finalize this before I'm gone.
(For info, we mean the trick like used in secure eventfd: 51e99f9)
Yeah, fair enough. I'm not blocking as we mentioned this single-process limitation in the doc and Gramine doesn't support multi-process operations on the same file anyhow (w/o warning logs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 18 files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)
libos/src/sys/libos_file.c
line 445 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Interesting, you're right after some quick testing on Linux.
I was confused by this Linux code: https://elixir.bootlin.com/linux/v6.11.5/source/fs/read_write.c#L1735. But apparently this code path is not used in
sendfile()
.Done.
Yeah, it looks to be checked here: https://elixir.bootlin.com/linux/v6.11.5/source/fs/splice.c#L1204.
Description of the changes
This PR adds
O_APPEND
emulation, but only within a single process.Note that Gramine doesn't support multi-process operations on the same file anyhow, in particular, the file position is not synchronized among multiple processes. This PR doesn't change this limitation of Gramine, thus
O_APPEND
writes don't work correctly (may overwrite file contents).Note that Linux has a bug in
pwrite()
. Gramine emulates this bug in the same way.Fixes #1921.
See also:
O_APPEND
duringopen()
is not honored #1152How to test this PR?
A new FS test is added. One LTP test is uncommented.
This change is