-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: anonymize annotations #7016
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7016 +/- ##
==========================================
- Coverage 89.71% 89.49% -0.23%
==========================================
Files 438 438
Lines 77473 77599 +126
Branches 12576 12601 +25
==========================================
- Hits 69505 69447 -58
- Misses 5158 5314 +156
- Partials 2810 2838 +28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but it would be good to hear from @alexrockhill as well
Hmmm I'm still getting the following error:
|
I am now realizing that the original problem was just on save with raw.anonymize() which is fixed, at least as far as I can tell with my checks. The other error I just posted also has to do with meas_date though so if you think it can also be in the PR then we can fix it here otherwise we can open another PR. |
@alexrockhill |
info['meas_date'] is None in the testing datasets for non-FIF data formats. Specifically, when reading in a ctf, bti, kit, bdf or set sometimes that field gets set to None. |
|
ok how do we want to anonymize a dataset that has
|
Is this the only date-related thing you'd need to anonymize if The thing to do might be to do everything relative to Jan 1 1970 (the epoch). But I haven't followed the anonymization discussion too closely so I'm not sure. |
The problem is that any default I guess if |
Yes, probably also |
Having daysback be a required argument could solve the issue of having a known relation to birthday. Then the meas_date could be set at daysback and the age could be related to that while still being anonymized. I think that works.... |
See https://github.com/mne-tools/mne-bids/pull/280/files for what I was thinking
|
@alexrockhill @bloyl ready to merge? if so LGTM |
No, there are still a few things I need to tweak. Hopefully on my slate for this afternoon/tomorrow. |
@larsoner @agramfort @jasmainak @alexrockhill this means that the earliest meas_date we can store is
and the latest is
I've modified This has immediate ramifications for mne-tools/mne-bids#280 which wants anonymized dates before 1900. But also gives an idea of when the fif file standard will need to be updated to something more robust. |
humm interesting, 2038 is not so far off ... we could modify the BIDS specification recommendations based on this. 1900 was a rather arbitrary choice but now these issues can help guide a better number |
Grr, 32-bit signed integers. Rather than change the spec it might be worth talking over on fiff-constants about how to handle this. For example if they allow us to add a |
Or LONGLONG, rather |
... actually there is already a 64-bit signed int primitive: https://github.com/mne-tools/fiff-constants/blob/master/DictionaryTypes.txt#L118 So we just need to ask if it's okay to change |
Pursuing some FIF format change in mne-tools/fiff-constants#24 |
I don't know enough about the fiff internals to really weigh in what the best course of action might be. So I leave it up to you folks. I do think its a big undertaking to just support being able to anonymize dates to pre1900 though. |
I would not push a change in the fif standard for this. For me the BIDS
standard
should be adjusted. FYI mne_anonymize was updated by matti 2 years ago
to anonymize the wakeman data and it was fine with current fif standard
… |
Maybe not for this, but there will be a problem in 20 years anyway, so why not fix it now? :) |
@larsoner is this going to break compatibility with MNE-C and FieldTrip? Many folks here partially analyze their data in |
FT we can update by updating MNE-MATLAB, and MNE-C can be updated, too. But yes it would cause these problems |
mne/io/meas_info.py
Outdated
% (key, key_2, | ||
repr(np.iinfo('>i4').min), | ||
repr(np.iinfo('>i4').max), | ||
repr(value[key_2]),)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need the repr() here
mne/io/meas_info.py
Outdated
'and "%r", got "%r"' | ||
% (repr((np.iinfo('>i4').min, 0)), | ||
repr((np.iinfo('>i4').max, 0)), | ||
repr(self['meas_date']),)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as elsewhere. You probably don't need the repr if you use %r
mne/io/meas_info.py
Outdated
|
||
try: | ||
info._check_consistency() | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid try-except clause somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a change that avoids this but seems abit kludgy.
Either is fine with me...
@bloyl does the whats_new page need to be updated? |
@jasmainak How do I check that my changes to whats_new page render correctly? can i do that check locally? |
mne/io/meas_info.py
Outdated
meas_date : tuple | None | ||
The Info object you want to use for overwriting values | ||
in target Info objects. | ||
dt : datetime.timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename dt to td. dt suggests it is datetime.datatime and not datetime.timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or better delta_t to match naming below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bloyl can you just take care of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be done now
mne/io/meas_info.py
Outdated
delta_t = meas_date_datetime - default_anon_dos | ||
# compute timeshift delta | ||
if daysback is None and info['meas_date'] is None: | ||
delta_t = datetime.timedelta(days=np.random.randint(365, 45 * 365)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not do this. It means the file you generate will be different if you call the function twice. this is error prone. I see already a version control system commit a huge file for nothing. I would enforce daysback to be not None or set the date to 1970-01-01 as if date we removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with making daysback
a required parameter is that it will encourage users to just pick a number (and likely the same number for all data). When in most cases people don't have longitudinal data and shouldn't need/want daysback
at all. To properly use daysback
the site needs to use a different number for each subject but the same number for datasets from the same subject.
The problem with picking a date for meas_date is None
cases is that then we would always be using that same daysback
which is just a constant timeshift of potentially identifiable information (birthday, device info, proc history etc)
In the latest round of edits. I just remove all the date/time info i know about if 'meas_date is None' I think this is robust in that you'll get the same info
out each time and that it will be anonymous. I also don't think it loses any important time interval data.
value['machid'][:] = 0 | ||
|
||
# exp 2 tests the keep_his option | ||
exp_info_2 = exp_info.copy() | ||
exp_info_2['subject_info']['his_id'] = 'foobar' | ||
|
||
# exp 3 tests is a supplied daysback | ||
dt = timedelta(days=43) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use dt for a timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@bloyl do you want me to push the suggestions I made? |
mne/io/meas_info.py
Outdated
meas_date : tuple | None | ||
The Info object you want to use for overwriting values | ||
in target Info objects. | ||
dt : datetime.timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bloyl can you just take care of this one?
remove str additions Co-Authored-By: Alexandre Gramfort <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @bloyl
@jasmainak or @larsoner feel free to merge if happy.
One of the travis builds had failed due to time limit being exceeded. I restarted it. |
Looks fine, thanks @bloyl ! |
Correctly set
annotations.orig_time
frommeas_date
.Thanks @alexrockhill for noticing it. should help with mne-tools/mne-bids#280
@alexrockhill can you see if it solves your issue?