Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vscode.workspace.fs.createWriteStream(). #84515

Open
ericsnowcurrently opened this issue Nov 11, 2019 · 26 comments
Open

Add vscode.workspace.fs.createWriteStream(). #84515

ericsnowcurrently opened this issue Nov 11, 2019 · 26 comments
Assignees
Labels
api api-proposal feature-request Request for new features or functionality file-io File I/O
Milestone

Comments

@ericsnowcurrently
Copy link
Member

(see #84175)

For the Python extension, we currently use node's fs.createWriteStream() for a variety of purposes:

  1. downloading multi-MB files (and tracking progress)
    • calling into a third-party library that expects an fs.WriteStream
  2. writing generated data out to a file a chunk at a time
  3. writing log data out to a file, one entry at a time

For example:

@jrieken

@ericsnowcurrently
Copy link
Member Author

From #84175 (comment):

we have some proposed api but nothing is final yet. @ericsnowcurrently What is the used for? Real big files or just because?

See above.

@jrieken
Copy link
Member

jrieken commented Nov 27, 2019

Let's use this issue to track the open, close, read, write API. That quite low-level but the building block for stream.

export interface FileSystemProvider {
open?(resource: Uri, options: { create: boolean }): number | Thenable<number>;
close?(fd: number): void | Thenable<void>;
read?(fd: number, pos: number, data: Uint8Array, offset: number, length: number): number | Thenable<number>;
write?(fd: number, pos: number, data: Uint8Array, offset: number, length: number): number | Thenable<number>;
}

@bpasero
Copy link
Member

bpasero commented Dec 5, 2019

From my recent experiences from implementing faster read/write IO for remote connections, I believe that the primitives are not a good choice and would rather swap them by a stream solution that has similar support (e.g. start offset + length).

@jrieken
Copy link
Member

jrieken commented Jul 9, 2020

Adding July for discussion

@jrieken jrieken modified the milestones: July 2020, Backlog Jul 17, 2020
@jrieken
Copy link
Member

jrieken commented Jul 17, 2020

This won't happen so soon. Discussion topics are mostly around streams vs chunks. The latter is the building block of the former and I believe that providers will have an easier job with chucks and consumer will have an easier times with streams. Streams might be become more interesting when we decide to investigate into supporting a fetch-compatible API which also works around streams...

@jrieken
Copy link
Member

jrieken commented Jul 17, 2020

Good read on stream APIs targeted for browsers: https://developer.mozilla.org/en-US/docs/Web/API/Streams_API

@bpasero bpasero removed their assignment Jun 7, 2021
@jrieken
Copy link
Member

jrieken commented Jun 25, 2021

JS supports async generator functions and async iterators and they will offer a neat way to implement this. Much more simple than stream, yet equally powerful. Tho, the primitives that we have today are usually at the bottom of all stream/async-iterator solutions.

export interface FileSystemProvider { 
    readFile(uri: Uri, token: CancellationToken): AsyncIterable<Uint8Array>
 } 

@connor4312
Copy link
Member

We'd like to take advantage of the consumer side of this API in the hex editor, and we've already added code, that uses the Node.js native filesystem API when possible, since the extension host API doesn't provide this yet. Getting a read(2)/write(2) analog would be quite useful 🙂

For the hex editor, a stream-only interface would not be sufficient, or at least the one proposed in #84515 (comment) which lacks a starting offset. We want to load data in incrementally, and if the user scrolls from byte 0 to byte 2GB, we don't want to have to read and discard everything in between.

Likewise for writing -- although we do need to rewrite if the file length changes, if a single byte in a 2GB file is edited there's no need to rewrite the whole thing.

Going with a more primitive approach, I would not have a simple fd: number and instead have an API something like how we do for bulk operations and edits:

export interface FileSystemProvider { 
    open(uri: Uri, options: { create: boolean; writable: boolean }, (handle: FileHandle) => Thenable<void> | void): Thenable<void>;
}

export interface FileHandle {
  read(pos: number, // ...)
  // ...

The native open(2) as well as fs.promises.open() lock the file on a filesystem level while changes are being made to provide consistency. It would make sense for this API to have the same guarantees. Wrapping the file handle in a bulk-style callback like this ensures disposal and discourages long-lived use of the FileHandle in a way that would cause locking problems for users. We could also provide a warning if a lock is held too long. Its nicely encapsulated lifecycle means that appropriate progress indicators can be shown on the file while writes happen if options.writable is set.

@bpasero
Copy link
Member

bpasero commented Dec 16, 2021

The native open(2) as well as fs.promises.open() lock the file on a filesystem level while changes are being made to provide consistency.

Are you sure? On all platforms?

It is unsafe to use fs.write() multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream() is recommended.

https://nodejs.org/docs/v14.16.0/api/fs.html#fs_fs_write_fd_buffer_offset_length_position_callback

I just recently added write locking for the node.js based file system provider via Barrier: 9df559f#diff-43c296d8b387331e4b0756c4f496d4f6102b1808554f598c748cbc4d8b33096d

@connor4312
Copy link
Member

connor4312 commented Dec 16, 2021

Are you sure? On all platforms?

At least when I tried it on Windows in O_RDWR | O_CREAT mode. It was for that reason I added code to release the file descriptor when not in use.

It is unsafe to use fs.write() multiple times on the same file without waiting for the callback

I don't think is contradictory, it's saying that if your application calls open() and gets the FileHandle back, multiple write operations should not happen on that file at the same time. The file is still locked for other processes.

@bpasero
Copy link
Member

bpasero commented Dec 16, 2021

👍 didn't know that.

I like the idea of providing an API that would reduce the chances of an extension forgetting to close the file handle because that would only ask for trouble. Now that the disk provider locks from the open call, we do not allow any more writes until close is called. There is no warning printed either.

@jrieken
Copy link
Member

jrieken commented Dec 16, 2021

For the hex editor, a stream-only interface would not be sufficient, or at least the one proposed in #84515 (comment) which lacks a starting offset.

Yeah, that's the shortened variant of the existing read-function which does take an offset, like write. I do like enforcing the "transaction" character by the callback, but I also like async-iterables...

@connor4312
Copy link
Member

Why not both?

@dbaeumer
Copy link
Member

dbaeumer commented Jun 3, 2022

Stupid question: how would I as an extension access the chunk API ?

The current proposed API allows that a file system provider implementor can provide these but I didn't find any API how I would call that API. I can't get to an individual FileSystemProvider and the FileSystem interface can not offer these methods unless we add a default implementation or throw if they are not available on the underlying provider.

@jrieken
Copy link
Member

jrieken commented Jun 7, 2022

Yeah, the proposal doesn't expose them on the "consumer" side yet

@AlbertoPdRF
Copy link

Hey!

I think this is the right place, so I'm going to drop the question: as of now, is it possible to read a large file in chunks using vscode.workspace.fs? I believe not, but I may have missed something from the conversation above. If that's the case, will it/when will it be possible?

Thanks!

@AlbertoPdRF
Copy link

By the way, I also found the related #41985 in the backlog, which can maybe be closed to avoid duplication as per #84515 (comment).

@bpasero
Copy link
Member

bpasero commented Aug 10, 2022

I think they are different proposals. One for POSIX primitives and one for a method to write via a stream.

@AlbertoPdRF
Copy link

Is there any chance that the proposed fsChuncks API finalization gets included to September's iteration plan?

@AlbertoPdRF
Copy link

Hey! Is there any information on this? Could the proposed fsChuncks API finalization get included in any of the coming monthly iteration plans? If any help is needed, I would be happy to help!

@AlbertoPdRF
Copy link

Hi @bpasero @jrieken, is there any chance that the finalization of the fsChunks API is included in any of the coming monthly iteration plans? Is there anything I can do to help move this forward? Or any chance to use the API even if it's not finalized?

I'd like to at least know the status on this, as I've had a PR (AlbertoPdRF/root-file-viewer#20) solving two issues blocked by this for almost a year now.

@AlbertoPdRF
Copy link

Hi @bpasero @jrieken, is there any chance that the finalization of the fsChunks API is included in any of the coming monthly iteration plans? Is there anything I can do to help move this forward? Or any chance to use the API even if it's not finalized?

I'd like to at least know the status on this, as I've had a PR (AlbertoPdRF/root-file-viewer#20) solving two issues blocked by this for almost a year now.

@mjbvz, maybe you can reply?

@zm-cttae
Copy link

zm-cttae commented Jun 13, 2023

@rbuckton
Copy link
Member

rbuckton commented Feb 2, 2024

@jrieken: I'm running into an issue with DeoptExplorer (https://github.com/microsoft/deoptexplorer-vscode) when attempting to parse very large log files due to workspace.fs.readFile having a 2GiB memory limit. I'd originally used NodeJS's fs to open a readable stream to read a single line at a time, but I lost that capability when I switch to using workspace.fs. Unfortunately, V8 isolate logs can get fairly large and there's no easy way to trim them down.

A streaming API would be a useful building block, but a workspace.fs.readLines async generator method that can yield a line at a time without needing to load the whole file into memory would also serve my needs.

@Lramseyer
Copy link

Since this seems to be the go-to discussion thread for the fsChunks API proposal, I'd also like to +1 this. I'm building a digital waveform viewer extension. Those files can be on the order of gigabytes, so being able to read individual chunks of a file would greatly improve performance and make it useful for a lot more projects.

Here's my extension:
https://github.com/Lramseyer/vaporview

@adrianstephens
Copy link

I don't know how far along this is, but can I suggest an alternative API:

export interface FileSystemProvider { 
    open?(resource: Uri, options: { create: boolean }): File| Thenable<File>; 
}

export interface File {
    close()	: Thenable<void>;
    read(pos: number, length: number): Thenable<Uint8Array>;
    write(pos: number, data: Uint8Array): Thenable<number>;
}

Because:

  • file descriptors are an extra burden to be managed by the FileSystemProvider
  • every use of a file descriptor needs to know its FSP as well
  • keeps the FSPs interface cleaner
  • file descriptors are so 1970s
  • for read, returning the data is simpler to use and is more consistent with the rest of the API
  • for write, Uint8Array already holds an offset and length into the underlying buffer; passing additional offsets and lengths seems redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api api-proposal feature-request Request for new features or functionality file-io File I/O
Projects
None yet
Development

No branches or pull requests

10 participants