Skip to content

Commit

Permalink
Embed filesystem metadata as a tar entry
Browse files Browse the repository at this point in the history
  • Loading branch information
georgestagg committed Sep 10, 2024
1 parent 7bc5c52 commit ea40804
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 16 deletions.
52 changes: 45 additions & 7 deletions R/tar.R
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,19 @@ add_tar_index <- function(file, strip = 0) {
remote_package_size = length(data)
)

# Append metadata to .tar data
json <- charToRaw(jsonlite::toJSON(metadata, auto_unbox = TRUE))
# Add metadata as additional .tar entry
entry <- create_metadata_entry(metadata)
json_block <- as.integer(tar_end / 512) + 1L

# Append additional metadata hint for webR
magic <- charToRaw('webR')
reserved <- raw(4) # reserved for future use
block <- writeBin(tar_end / 512, "integer", size = 4, endian = "big")
len <- writeBin(length(json), "integer", size = 4, endian = "big")
length(json) <- 4 * ceiling(length(json) / 4) # pad to 4 byte boundary
data <- c(data[1:tar_end], json, magic, reserved, block, len)
block <- writeBin(json_block, raw(), size = 4, endian = "big")
len <- writeBin(entry$length, raw(), size = 4, endian = "big")
hint <- c(magic, reserved, block, len)

# Build new .tar archive data
data <- c(data[1:tar_end], entry$data, raw(1024), hint)

# Write output and move into place
out <- tempfile()
Expand All @@ -78,6 +83,38 @@ add_tar_index <- function(file, strip = 0) {
fs::file_copy(out, file, overwrite = TRUE)
}

create_metadata_entry <- function(metadata) {
# metadata contents
json <- charToRaw(jsonlite::toJSON(metadata, auto_unbox = TRUE))
len <- length(json)
blocks <- ceiling(len/512)
length(json) <- 512 * blocks

# entry header
timestamp <- as.integer(Sys.time())
header <- raw(512)
header[1:15] <- charToRaw('.vfs-index.json') # filename
header[101:108] <- charToRaw('0000644 ') # mode
header[109:116] <- charToRaw('0000000 ') # uid
header[117:124] <- charToRaw('0000000 ') # gid
header[125:136] <- charToRaw(sprintf("%011o ", len)) # length
header[137:148] <- charToRaw(sprintf("%011o ", timestamp)) # timestamp
header[149:156] <- charToRaw(' ') # placeholder
header[157:157] <- charToRaw('0') # type
header[258:262] <- charToRaw('ustar') # ustar magic
header[264:265] <- charToRaw('00') # ustar version
header[266:269] <- charToRaw('root') # user
header[298:302] <- charToRaw('wheel') # group

# populate checksum field
checksum <- raw(8)
checksum[1:6] <- charToRaw(sprintf("%06o", sum(as.integer(header))))
checksum[8] <- charToRaw(' ')
header[149:156] <- checksum

list(data = c(header, json), length = len)
}

read_tar_offsets <- function(con, strip) {
entries <- list()
next_filename <- NULL
Expand All @@ -88,7 +125,8 @@ read_tar_offsets <- function(con, strip) {

# Empty header indicates end of archive
if (all(header == 0)) {
seek(con, 512, origin = "current")
# Return connection position to just before this header
seek(con, -512, origin = "current")
break
}

Expand Down
2 changes: 1 addition & 1 deletion inst/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ articles:
mount-host-dir: mount-host-dir.html
rwasm: rwasm.html
tar-metadata: tar-metadata.html
last_built: 2024-09-09T10:04Z
last_built: 2024-09-10T15:29Z
urls:
reference: https://r-wasm.github.io/rwasm/reference
article: https://r-wasm.github.io/rwasm/articles
31 changes: 23 additions & 8 deletions vignettes/tar-metadata.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,31 @@ The resulting output can be directly mounted by webR to the virtual filesystem,

See the [Mounting filesystem images](mount-fs-image.html) article for more information about mounting filesystem images.

## Filesystem metadata

Virtual filesystem metadata is a JavaScript object, encoded as a JSON string. The format is defined and output by Emscripten's `file_packager` tool and understood by [webR's mounting API](mount-fs-image.html). The metadata object gives the location of each file in the archive to be mounted, and takes the following format:

```javascript
{
files: {
filename: string;
start: number;
end: number;
}[],
};
```

## Archive data layout

A `.tar` archive that includes Emscripten filesystem metadata has the data layout given below. The resulting `.tar` file may be gzip compressed, with file extension `.tar.gz` or `.tgz`.
A `.tar` archive that can be directly mounted by webR includes filesystem metadata as a file named `.vfs-index.json` at the top level of the archive. The `.tar` archive may also include a "metadata hint" at the very end of the file, after the end-of-archive marker. Appending additional hint data is optional, but allows for more efficient mounting of archive contents to the virtual filesystem.

The resulting `.tar` file may be gzip compressed, with file extension `.tar.gz` or `.tgz`.

| Field | Size | Description |
|-|---|-------------|
| 0 | Variable | Standard `.tar` data, including end-of-archive marker. |
| 1 | Variable | JSON metadata, UTF8 encoded, padded with `0x00` to 4 byte boundary. |
| 2 | 4 bytes | Magic bytes: The string `"webR"`, UTF8 encoded (`0x77656252`). |
| 3 | 4 bytes | Reserved, currently `0x00000000`. |
| 4 | 4 bytes | Offset of JSON metadata, in units of 512-byte blocks. Signed integer, big endian. |
| 5 | 4 bytes | Length of JSON metadata, not including trailing null characters, in bytes. Signed integer, big endian. |
Table: Data layout for a `.tar` archive with filesystem metadata.
| 0 | Variable | Standard `.tar` data, including the end-of-archive marker. |
| 1 | 4 bytes | Magic bytes: The string `"webR"`, UTF8 encoded (`0x77656252`). |
| 2 | 4 bytes | Reserved, currently `0x00000000`. |
| 3 | 4 bytes | Offset of `.vfs-index.json`, in units of 512-byte blocks. Signed integer, big endian. |
| 4 | 4 bytes | Length of `.vfs-index.json`, in bytes. Signed integer, big endian. |
Table: Data layout for a `.tar` archive containing filesystem metadata.

0 comments on commit ea40804

Please sign in to comment.