Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

organize: add optional --disappeared=error|remove or --mode=complete|incremental #47

Open
yarikoptic opened this issue Mar 13, 2020 · 5 comments

Comments

@yarikoptic
Copy link
Member

If organize is (re)run on a full collection of files, some of which were modified, then by "injecting" new files and leaving previously organized ones we might end up breeding some . With something like --disappeared we could detect which paths were no longer considered and offer to remove them.

Alternatively - we could make a default mode to be "complete" (default) which would imply removing files which were not "re-organized" (probably should ask first since could lead to the loss of data) and only by adding explicit --mode=incremental (complete - default) new file added in, while allowing all previous stick around.

@satra
Copy link
Member

satra commented Mar 13, 2020

to start with i would keep it simple. the source contains all data you want to re-organize. the target is the organized output. (almost like rsync with delete option). i think this is conceptually easier for people to comprehend. i.e. their source is the master. (think analogous to heudiconv).

so target is simply a view of the source.

i do like the consideration of complete/incremental but i think incremental has many possibilities and it would require other types of assessments to make work properly.

@yarikoptic
Copy link
Member Author

NB --mode was a bad choice ATM since I used --mode to instruct the mode of files handling -- i.e. move, symlink, etc. Will rename into -f|--files-mode.

to start with i would keep it simple. the source contains all data you want to re-organize. the target is the organized output. (almost like rsync with delete option). i think this is conceptually easier for people to comprehend. i.e. their source is the master. (think analogous to heudiconv).

so target is simply a view of the source.

so that would be the default (complete) mode of --mode=complete|incremental. ATM it is behaving as incremental (no care is taken about wiping things out). With complete and move and/or copy or hardlink to provide "efficient" complete mode of operation, would need to introduce some way to check that files are the same to avoid unnecessary transfer. In simulate mode, since it is cheap, I demand target directory to not exist (so no data gets overwritten etc).

yarikoptic added a commit that referenced this issue Mar 13, 2020
@yarikoptic
Copy link
Member Author

I guess, in addition to proper "incremental" (where I point to e.g. a single additional file to be 'organized') the most user-friendly way would be similar to what we should aim for upload/download - a sync mode: given a path to full collection of data files, organize would announce what actions to be done if any existing file needs to be removed or renamed, and possibly even ask for a confirmation.

Then we could have consistent user interface for all 3 possible data locations:

  • disorganized local
  • organized local
  • remote on dandiarchive

and by consistent I mean

@yarikoptic
Copy link
Member Author

as for "incremental", if we allow for some options (e.g. either to minimize filenames or even use some alternative set of keys), we would need to store them in target dandiset somewhere so subsequent invocation . At some point I suggested to store them directly within dandiset.yaml, but we might need to just come up with .dandi/config or alike.

@yarikoptic
Copy link
Member Author

Another additional behavior which would lead to relatively smooth user experience I think could be implemented via organize for incremental additions - consider all existing in the target location files as well. With currently implemented already caching of loading metadata it should not effect performance that gravely, but would ensure consistent metadata in dandiset.yaml (if all other subjects are present). It would make disambiguation informed by existing in the dandiset files as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants