-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datalad sensitive marking fixes #739
Conversation
…an init to fix reruns issue
bd91bbb
to
6ba6cba
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #739 +/- ##
==========================================
+ Coverage 81.97% 82.01% +0.04%
==========================================
Files 41 41
Lines 4149 4159 +10
==========================================
+ Hits 3401 3411 +10
Misses 748 748 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, thank you!
But I think we can avoid referring to last_commit since we are not even 100% certain it has happened just now, and we can explicitly have the list of just committed files ;-)
Above in a call to ds.save
- get the results and go through them -- they should include all files which were add'/committ'ed . Eg.
❯ rm existed && echo "replaced" >| existed; touch newfile; python -c "import datalad.api as dl; print(dl.save('.'))"
<frozen importlib._bootstrap>:1047: ImportWarning: _SixMetaPathImporter.find_spec() not found; falling back to find_module()
add(ok): newfile (file)
add(ok): existed (file)
save(ok): . (dataset)
action summary:
add (ok: 2)
save (ok: 1)
[{'action': 'add', 'path': '/tmp/testds/newfile', 'type': 'file', 'refds': '/tmp/testds', 'status': 'ok', 'message': '', 'key': 'MD5E-s0--d41d8cd98f00b204e9800998ecf8427e'}, {'action': 'add', 'path': '/tmp/testds/existed', 'type': 'file', 'refds': '/tmp/testds', 'status': 'ok', 'message': '', 'key': 'MD5E-s9--d908d26cac8092d475f40a5179ca6347'}, {'action': 'save', 'type': 'dataset', 'path': '/tmp/testds', 'refds': '/tmp/testds', 'status': 'ok'}]
so go through those with action: 'add'
and it should give you all the paths which were just saved if any
f9c37d1
to
2844d54
Compare
I don't think it should be limited to the one added, eg. if heudiconv is run with overwrite (and overwritten annexed files are unlocked) we want to (re) mark the updated annexed files as sensitive in case the metadata was removed on the overwritten files. |
if not paths: | ||
return | ||
lgr.debug("Marking %d files with distribution-restrictions field", len(paths)) | ||
# set_metadata can be a bloody generator | ||
res = ds.repo.set_metadata( | ||
paths, init=dict([("distribution-restrictions", "sensitive")]), recursive=True | ||
paths, add=dict([("distribution-restrictions", "sensitive")]), recursive=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I worried in case of --overwrite
we might keep adding multiple distribution-restrictions=sensitive
... but it seems that same value is not getting duplicated, only a new one
❯ git annex metadata --set distribution-restrictions+=sensitive sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
metadata sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
distribution-restrictions=sensitive
distribution-restrictions-lastchanged=2024-02-24@00-12-47
lastchanged=2024-02-24@00-12-47
ok
(recording state in git...)
❯ git annex metadata --set distribution-restrictions+=sensitive sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
metadata sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
distribution-restrictions=sensitive
distribution-restrictions-lastchanged=2024-02-24@00-12-49
lastchanged=2024-02-24@00-12-49
ok
(recording state in git...)
❯ git annex metadata sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
metadata sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb
distribution-restrictions=sensitive
distribution-restrictions-lastchanged=2024-02-24@00-12-49
lastchanged=2024-02-24@00-12-49
only the time stamp would be changed ... since we are filtering on only saved files -- I think that should be good
here "add" is more like
well, let's hope that with filtering on globs nothing unexpected would happen -- let's try it out! |
🚀 PR was released in |
Fix assignment of sensitive git-annex metadata data via glob patterns (regression introduced by #739)
Mark sensitive files added in the commit not all globbed.
Currently all globbed files in the current tree get marked as sensitive which could overwrite previously removed metas (eg after defacing), it now filter file in the commit where converted files are added.
Add metadata rather than init to fix reruns issue.
If one runs heudiconv on a branch, remove the metadata (eg after defacing) then remove that branch and then rerun the same conversion on the same dataset, it seems like the new files with same names inherit the absence of metadata from the previous files (from git-annex branch). This is unexpected as the git-annex hash should differ so
.met
file should not be picked-up, but there is some git-annex black magic going on here. The issue is that sensitive files do not get marked as non-sensitive.