organize: add optional `--disappeared=error|remove` or `--mode=complete|incremental` #47

yarikoptic · 2020-03-13T02:06:35Z

If organize is (re)run on a full collection of files, some of which were modified, then by "injecting" new files and leaving previously organized ones we might end up breeding some . With something like --disappeared we could detect which paths were no longer considered and offer to remove them.

Alternatively - we could make a default mode to be "complete" (default) which would imply removing files which were not "re-organized" (probably should ask first since could lead to the loss of data) and only by adding explicit --mode=incremental (complete - default) new file added in, while allowing all previous stick around.

The text was updated successfully, but these errors were encountered:

satra · 2020-03-13T14:20:02Z

to start with i would keep it simple. the source contains all data you want to re-organize. the target is the organized output. (almost like rsync with delete option). i think this is conceptually easier for people to comprehend. i.e. their source is the master. (think analogous to heudiconv).

so target is simply a view of the source.

i do like the consideration of complete/incremental but i think incremental has many possibilities and it would require other types of assessments to make work properly.

yarikoptic · 2020-03-13T14:35:09Z

NB --mode was a bad choice ATM since I used --mode to instruct the mode of files handling -- i.e. move, symlink, etc. Will rename into -f|--files-mode.

to start with i would keep it simple. the source contains all data you want to re-organize. the target is the organized output. (almost like rsync with delete option). i think this is conceptually easier for people to comprehend. i.e. their source is the master. (think analogous to heudiconv).

so target is simply a view of the source.

so that would be the default (complete) mode of --mode=complete|incremental. ATM it is behaving as incremental (no care is taken about wiping things out). With complete and move and/or copy or hardlink to provide "efficient" complete mode of operation, would need to introduce some way to check that files are the same to avoid unnecessary transfer. In simulate mode, since it is cheap, I demand target directory to not exist (so no data gets overwritten etc).

see #47 for more discussion

yarikoptic · 2020-04-06T22:41:04Z

I guess, in addition to proper "incremental" (where I point to e.g. a single additional file to be 'organized') the most user-friendly way would be similar to what we should aim for upload/download - a sync mode: given a path to full collection of data files, organize would announce what actions to be done if any existing file needs to be removed or renamed, and possibly even ask for a confirmation.

Then we could have consistent user interface for all 3 possible data locations:

disorganized local
organized local
remote on dandiarchive

and by consistent I mean

similar feedback to the user about ongoing operations (upload+download: RF to reuse logic/options/UI #48)
similar prompts/questions if operation is to replace, rename, or remove existing files at the target location

yarikoptic · 2020-04-06T22:43:11Z

as for "incremental", if we allow for some options (e.g. either to minimize filenames or even use some alternative set of keys), we would need to store them in target dandiset somewhere so subsequent invocation . At some point I suggested to store them directly within dandiset.yaml, but we might need to just come up with .dandi/config or alike.

yarikoptic · 2020-06-01T22:40:22Z

Another additional behavior which would lead to relatively smooth user experience I think could be implemented via organize for incremental additions - consider all existing in the target location files as well. With currently implemented already caching of loading metadata it should not effect performance that gravely, but would ensure consistent metadata in dandiset.yaml (if all other subjects are present). It would make disambiguation informed by existing in the dandiset files as well.

yarikoptic added a commit that referenced this issue Mar 13, 2020

RF: organize: --mode -> --files-mode

e4af585

see #47 for more discussion

This was referenced Apr 6, 2020

organize: option to specify keys to be used instead of current "hard coded" #69

Closed

upload+download: RF to reuse logic/options/UI #48

Open

jwodder added the cmd-organize label Apr 15, 2021

satra mentioned this issue Nov 3, 2022

[Feature] Allow dandi organize to force usage of session_id (NWB Assets) #1000

Closed

yarikoptic mentioned this issue Dec 7, 2022

Add option to use (metadata of) assets already in the archive for the dandiset #1169

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

organize: add optional `--disappeared=error|remove` or `--mode=complete|incremental` #47

organize: add optional `--disappeared=error|remove` or `--mode=complete|incremental` #47

yarikoptic commented Mar 13, 2020

satra commented Mar 13, 2020

yarikoptic commented Mar 13, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Jun 1, 2020

organize: add optional --disappeared=error|remove or --mode=complete|incremental #47

organize: add optional --disappeared=error|remove or --mode=complete|incremental #47

Comments

yarikoptic commented Mar 13, 2020

satra commented Mar 13, 2020

yarikoptic commented Mar 13, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Apr 6, 2020

yarikoptic commented Jun 1, 2020

organize: add optional `--disappeared=error|remove` or `--mode=complete|incremental` #47

organize: add optional `--disappeared=error|remove` or `--mode=complete|incremental` #47