Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader plugin redesign #106

Open
nclack opened this issue Feb 16, 2022 · 5 comments
Open

Reader plugin redesign #106

nclack opened this issue Feb 16, 2022 · 5 comments

Comments

@nclack
Copy link
Collaborator

nclack commented Feb 16, 2022

The current spec around reader plugins has some problems that would be nice to address with a schema change. This would likely require a change of the schema version.

Opening this issue to start gathering ideas/requirements.

Problems:

  • The two-call pattern is unnecessary now since file type detection is handled by the manifest.
  • Clarify behavior when readers are passed a list of paths. It would be nice to signal when a list of layers is expected vs a desire to aggregate into a single Layer (see discussion in Clarify Typing. #105).

Opportunities:

  • add support for streaming (e.g. consuming a video bytestream)
  • add support for partial reads (e.g. frames 100-200 of a tiff stack)
@Carreau
Copy link
Collaborator

Carreau commented Feb 17, 2022

Some questions that were asked at yesterday meeting:

  • Can we support passing buffers in instead of filenames
    I think that's a bigger redesign, but if buffer, then how to we handle types if there might be no file extensions ? Mimetypes ?

but in the meantime , if the two call is indeed unnecessary as we do the pattern matching on file extension can we pass in seekable buffers ? Is that a better pattern ?

Another concern that Talley had was how doe that work with npe1 shim, and the viewer.open(stack=...) parameters.

@DragaDoncila
Copy link
Contributor

DragaDoncila commented Apr 15, 2022

The two-call pattern is unnecessary now since file type detection is handled by the manifest.

I think this may warrant more discussion... While its canonical usage has been to distinguish among different file formats with simple if path.endswith() checks or similar, it has considerable more power than that. I have frequently used it to check for contents following prescribed formats e.g. tracking challenge data (check for specific directory structure and filename conventions). I'm also thinking of ome-zarr which requires the presence of a .zarray and .zgroup file, and use cases described in this discussion.

I do think this makes up a relatively small percentage of use cases, but they're use cases we should consider. Indeed my feeling for a long time is that we need to encourage better usage of the get_reader function. Take .txt files for example - a reader for such a file could do just about anything with the data inside. A get_reader would ideally peek inside and check its format before simply returning the reader function. More careful construction of get_reader functions by plugin developers could actually improve the stability (perceived and actual) of our plugin ecosystem - many plugins fail to read because they actually didn't mean to claim that file in the first place - see our built in zarr reader and ome-zarr. Curious what @jni thinks about this.

@jni
Copy link
Member

jni commented Apr 15, 2022

Curious what @jni thinks about this.

@jni agrees. 😂 We need to give plugins a chance to peek, and we need to document this usage.

@nclack
Copy link
Collaborator Author

nclack commented May 8, 2022

My favorite would be to replace the existing two-stage API with a stateful reader API. Something like:

Open(path)->R
Close(R)
GetContentType(R)->ContentType
Get/Request(R,request_params)->Data

The idea is to do something that works like HTTP range requests and works well with e.g. ffmpeg. But this might also take care of the "determine the file type without committing too deeply to this file" problem.

@jni
Copy link
Member

jni commented May 8, 2022

@nclack for those of us not familiar with http requests, let alone http range requests 😂, could you flesh out that pattern a bit more? For example, my internal linter is complaining that ContentType is never used. It also thinks that R is too short a variable name. 😜 Finally, I wonder if the open/close should be one command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants