Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal implementation of memfd_create() and file sealing #1224

Open
wkozaczuk opened this issue Apr 6, 2023 · 1 comment
Open

Minimal implementation of memfd_create() and file sealing #1224

wkozaczuk opened this issue Apr 6, 2023 · 1 comment

Comments

@wkozaczuk
Copy link
Collaborator

As I was trying to get the libsql with webassembly running on OSv per example described in this article, I discovered that the wasmtime-jit used by libsql calls memfd_create() and then fcntl() to seal the webassembly code for security reasons I guess. For details, please see these search results in wasmtime.

Based on my understanding of memfd_create and file sealing it is intended to be used in combination as a security feature for example protecting modification to a file in memory created by memfd_create() between two or more processes (see this article - https://lwn.net/Articles/593918/). Given this how much of this makes sense on OSv?

I actually managed to provide a minimal implementation of memfd_create() based on the assumption that /tmp is a mount to the RAMFS filesystem like with ROFS images. I also stubbed F_ADD_SEALS. This was enough to get the examples from the libsql article to run.

diff --git a/fs/vfs/main.cc b/fs/vfs/main.cc
index df2becbe..4ede5231 100644
--- a/fs/vfs/main.cc
+++ b/fs/vfs/main.cc
@@ -1676,6 +1676,9 @@ TRACEPOINT(trace_vfs_fcntl, "%d %d 0x%x", int, int, int);
 TRACEPOINT(trace_vfs_fcntl_ret, "\"%s\"", int);
 TRACEPOINT(trace_vfs_fcntl_err, "%d", int);
 
+#define F_ADD_SEALS    1033
+
 extern "C" OSV_LIBC_API
 int fcntl(int fd, int cmd, int arg)
 {
@@ -1745,6 +1748,12 @@ int fcntl(int fd, int cmd, int arg)
     case F_SETOWN:
         WARN_ONCE("fcntl(F_SETOWN) stubbed\n");
         break;
+    case F_ADD_SEALS:
+        WARN_ONCE("fcntl(F_ADD_SEALS) stubbed\n");
+        break;
     default:
         kprintf("unsupported fcntl cmd 0x%x\n", cmd);
         error = EINVAL;
diff --git a/linux.cc b/linux.cc
index 1fa01326..ccc5f8ec 100644
--- a/linux.cc
+++ b/linux.cc
@@ -427,6 +427,29 @@ static int tgkill(int tgid, int tid, int sig)
 #define __NR_sys_getdents64 __NR_getdents64
 extern "C" ssize_t sys_getdents64(int fd, void *dirp, size_t count);
 
+#define __NR_sys_memfd_create SYS_memfd_create
+static int sys_memfd_create(const char *name, unsigned int flags)
+{
+    auto *file = tmpfile();
+    if (file) {
+        auto fd = fileno(file);
+        if (fd) {
+            auto fd2 = dup(fd);
+            fclose(file);
+            if (fd2) {
+               return fd2;
+            } else {
+               return -1;
+            }
+        } else {
+            fclose(file);
+            return -1;
+        }
+    } else {
+        return -1;
+    }
+}
+
 OSV_LIBC_API long syscall(long number, ...)
 {
     // Save FPU state and restore it at the end of this function
@@ -517,6 +540,8 @@ OSV_LIBC_API long syscall(long number, ...)
     SYSCALL3(symlinkat, const char *, int, const char *);
     SYSCALL3(sys_getdents64, int, void *, size_t);
     SYSCALL4(renameat, int, const char *, int, const char *);
+    SYSCALL3(mprotect, void *, size_t, int);
+    SYSCALL2(sys_memfd_create, const char *, unsigned int);
     }
 
     debug_always("syscall(): unimplemented system call %d\n", number);

I wonder what a better implementation of it should look like.

@nyh
Copy link
Contributor

nyh commented Apr 9, 2023

I agree with you that the entire raison d'etre of the "file sealing" stuff is security between two untrusting applications, which is NOT a concern on OSv so we can stub the sealing fcntl and leave it stub.

I also agree with you that memfd_create (by the way, please create such a function, not just a system call - why did you need the system call and not the function?) can simply create a file on /tmp and return that.

The fact that your implementation uses tmpfile() which returns a FILE* instead of file descriptor and you need to use dup and fclose is a little funky, but I agree that it does exactly the right thing and saves you duplicating a few (but not many!) lines of code - calling the random name generation function and unlink().

Please consider adding F_ADD_SEALS the the appropriate header file instead of the source file (maybe include/api/fcntl.h?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants