Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Deno.writeFile(readableStream) can result in "chimera" files (partially overwritten files) #19697

Closed
jkscx opened this issue Jul 3, 2023 · 2 comments · Fixed by #23330

Comments

@jkscx
Copy link

jkscx commented Jul 3, 2023

TLDR:

Deno.writeFile doesn't truncate the file when the data source is a ReadableStream. That produces a “chimera” (partially overwritten) file when overwriting a larger file with a smaller file. In my case it “only” resulted in an un-parsable file, but could also be a security issue similar to aCropalypse (CVE 2023-21036), where an overwritten file’s content is partially recoverable, if the resulting file was uploaded somewhere on the internet, but I’d say that the likelihood of that happening is rather low.

Merge request which introduced the issue: PR 17329
I could provide a short PR with the fix and a test case for this if you are willing to accept it.

Full story:

I wrote a script which processes a list of URLs. For each of these URLs, I get its HTTP body, save it to a temp file and process the file. Because I don’t need the file I “discard” it by overwriting it with the HTTP body of the next URL, i.e. I reuse the same temp file for each URL.

While running my script, I noticed that sometimes my file processing code failed with a file-not-parsable error. That was strange to me as whenever I manually downloaded the URL for which processing failed and run my file processing code it got processed successfully.

After playing around with it a bit, I noticed that when my automatic processing failed, the file’s size didn’t match the HTTP body’s size. On closer inspection, I realized that it matched the HTTP body of the previous URL, i.e. the one which was downloaded before this one. Weird…

It seemed that instead of the new file overwriting the old file, it resulted in a file which had the same size as the old file, but which failed to parse. This lead me to believe that the overwriting somehow silently failed and resulted in a corrupt file. But why was it only failing sometimes in overwriting it and not always?

At first I thought that maybe writing the file to disk was not instantaneous (or that maybe Deno.writeFile returned a tad too prematurely) and then my processing code read a broken state while it was still being flushed to disk? Adding some delays didn’t help so it had to be something else…

After trying out some other ideas in a test environment and reducing the problem further and further in scope, I managed to find that the condition for a faulty overwrite was that the new file has to be smaller than the old file. Eureka!

This immediately made me believe that it’s a file truncation issue and upon checking the Deno source code I confirmed that it was so. Someone forgot to set the truncate option and when it was not set it defaulted to not truncating…

Code to reproduce the issue:

//// SETUP
const encoder = new TextEncoder();

const streamShort = new ReadableStream({
    pull(controller) {
        controller.enqueue(encoder.encode("YYYYY"));
        controller.close();
    },
});

const streamShort2 = new ReadableStream({
    pull(controller) {
        controller.enqueue(encoder.encode("ZZ"));
        controller.close();
    },
});

const tmpDir = Deno.makeTempDirSync({prefix: "my_deno_test"});
const filePath = tmpDir + "/test.txt";

//// ISSUE
// create a file with some content
await Deno.writeFile(filePath, encoder.encode("XXXXXXXXXXX"));
console.debug(`File content at beginning:                 '${await Deno.readTextFile(filePath)}'`);
console.debug(`File size: ${(await Deno.stat(filePath)).size}`)

// now write to the same file a ReadableStream which is shorter than the content already in the file
await Deno.writeFile(filePath, streamShort);

// reading the file we see that the stream has been written to it, but because the stream was shorter
// than the content already in the file, it resulted in a chimera file
console.debug(`File content after writing stream:         '${await Deno.readTextFile(filePath)}'`);
console.debug(`File size: ${(await Deno.stat(filePath)).size}`)

//// FIX
const file = await Deno.open(filePath, {
    append: false,
    create: true,
    createNew: false,
    write: true,
    truncate: true, // the fix
});
await streamShort2.pipeTo(file.writable);

console.debug(`File content after writing stream (fixed): '${await Deno.readTextFile(filePath)}'`);
console.debug(`File size: ${(await Deno.stat(filePath)).size}`)

Code output:

File content at beginning:                 'XXXXXXXXXXX'
File size: 11
File content after writing stream:         'YYYYYXXXXXX'
File size: 11
File content after writing stream (fixed): 'ZZ'
File size: 2
@wwoast
Copy link

wwoast commented Feb 23, 2024

I confirmed this behavior as well. I was refactoring some code dealing with Tar and Untar from the stdlib for copying files across folders. I was trying to get rid of usage of await copy() and replacing it with await Deno.writeFile(path, toReadableStream(reader)).

A subtle but frustrating issue, and I'm surprised it's persisted for so long. I'm at least one person that would appreciate your PR!

EDIT: after writing this I discovered that copy is no longer deprecated now that it's in io/copy and not streams/copy: denoland/std#4128

@kLiHz
Copy link

kLiHz commented May 3, 2024

Hi @jkscx, looks like a working PR #23330 is dealing with this issue. I think it's making append: false truncating too for ReadableStream. If my understandings are correct, the result would be an append or truncate situation.

I'm not sure if a non-truncating option (like Python's open() with a 'r+' mode) is still necessary. If it is, I think there might be an API change?


Edit: Sorry I just found out I could still use Deno.open() if I'd like a non-truncating file write.

const f = await Deno.open('o.txt', {write: true, create: true});
await f.write(new Uint8Array([65, 66, 67]));
f.close();

const q = await Deno.open('o.txt', {write: true});
await q.write(new Uint8Array([68]));
q.close()

And a new issue should be opened if I'd want to ask about truncate option in Deno.write[Text]File[Sync] functions.

bartlomieju pushed a commit that referenced this issue May 27, 2024
…ile (#23330)

Closes #19697. This fixes a bug where the writeFile API can create
partially-overwritten files which may lead to invalid / corrupt files or
data leakage. It also aligns the behavior of writing a ReadableStream
and writing a Uint8Array to the disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants