Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support plain directory (not super-dataset) with datasets under it #35

Open
yarikoptic opened this issue Dec 14, 2021 · 3 comments
Open
Assignees

Comments

@yarikoptic
Copy link
Member

Current implementation takes dataset argument so somewhat relies on navigating through subdataset.

In case of RIA stores or other collections (e.g. datalad-registry cache) it would just be a hierarchy of directories which would have datasets under.

@jwodder
Copy link
Member

jwodder commented Dec 14, 2021

@yarikoptic I assume that you want all of the commands to support plain directories of datasets. What should the syntax be for passing a non-dataset directory?

@yarikoptic
Copy link
Member Author

that is a good question.

  • we have --dataset which is now (as in the rest of the datalad) assumes that it would be the path to dataset. I think we should not "abuse" it to allow for some arbitrary path with datasets under it.
  • we could add --datasets-path option I guess, which would be the "top directory from which to provide access to files within DataLad datasets. Regular paths, not under any dataset will be exposed as-is." Then we just need to make two options mutually exclusive (i.e. they must not be both be specified at the same time). But at least it would make it explicit/clear and keep consistent (in having --dataset) with the rest of the datalad world.
  • indeed we would then need to provide it for at least fusefs and fsspec-cache-clear commands
    • tricky point is fsspec-cache-clear which has --recursive so we would need to traverse the tree to identify all immediate datasets. Might be expensive. But in a prototypical (e.g. RIA store) case they would be right near the top of the hierarchy. Obviously .git and .hg directories should not be traversed, but I think in general we should not exclude .dotdirs -- who knows what ppl might put there - could be datalad datasets

@yarikoptic
Copy link
Member Author

Another use case wanting this issue being addressed -- our /mnt/backup/dandi/dandizarrs which aren't superdataset. I wanted to do some mass check across those zarrs so needed to fuse mount them. ping @jwodder on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants