Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: anonymize annotations #7016

Merged
merged 11 commits into from
Nov 13, 2019
Merged

FIX: anonymize annotations #7016

merged 11 commits into from
Nov 13, 2019

Conversation

bloyl
Copy link
Contributor

@bloyl bloyl commented Nov 4, 2019

Correctly set annotations.orig_time from meas_date.

Thanks @alexrockhill for noticing it. should help with mne-tools/mne-bids#280

@alexrockhill can you see if it solves your issue?

@bloyl bloyl changed the title BUG: anonymize annotations FIX: anonymize annotations Nov 4, 2019
@codecov
Copy link

codecov bot commented Nov 4, 2019

Codecov Report

Merging #7016 into master will decrease coverage by 0.22%.
The diff coverage is 77.27%.

@@            Coverage Diff             @@
##           master    #7016      +/-   ##
==========================================
- Coverage   89.71%   89.49%   -0.23%     
==========================================
  Files         438      438              
  Lines       77473    77599     +126     
  Branches    12576    12601      +25     
==========================================
- Hits        69505    69447      -58     
- Misses       5158     5314     +156     
- Partials     2810     2838      +28

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but it would be good to hear from @alexrockhill as well

@alexrockhill
Copy link
Contributor

LGTM but it would be good to hear from @alexrockhill as well

Hmmm I'm still getting the following error:

mne_bids/tests/test_write.py:630: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
mne_bids/write.py:902: in write_raw_bids
    keep_his=keep_his)
../mne-python/mne/io/meas_info.py:1921: in anonymize_info
    meas_date_datetime = _stamp_to_dt(info['meas_date'])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

utc_stamp = None

    def _stamp_to_dt(utc_stamp):
        """Convert timestamp to datetime object in Windows-friendly way."""
        # The min on windows is 86400
>       stamp = [int(s) for s in utc_stamp]
E       TypeError: 'NoneType' object is not iterable

@alexrockhill
Copy link
Contributor

I am now realizing that the original problem was just on save with raw.anonymize() which is fixed, at least as far as I can tell with my checks. The other error I just posted also has to do with meas_date though so if you think it can also be in the PR then we can fix it here otherwise we can open another PR.

@bloyl
Copy link
Contributor Author

bloyl commented Nov 4, 2019

@alexrockhill
in your second error: how did info['meas_date'] get set to None. I don't think that should happen with anonymize_info

@alexrockhill
Copy link
Contributor

@alexrockhill
in your second error: how did info['meas_date'] get set to None. I don't think that should happen with anonymize_info

info['meas_date'] is None in the testing datasets for non-FIF data formats. Specifically, when reading in a ctf, bti, kit, bdf or set sometimes that field gets set to None.

@larsoner
Copy link
Member

larsoner commented Nov 4, 2019

info['meas_date'] can indeed be None, it's meant to be a supported value. It basically means "don't have a meas date".

@bloyl
Copy link
Contributor Author

bloyl commented Nov 4, 2019

ok how do we want to anonymize a dataset that has info['meas_date'] == None?
I can't pick a default meas_date like (0,0) beacuse then the subject bday won't be anonymized. I guess I could pick a random time stamp.
Thoughts?

also @alexrockhill can you tell me which testing file will throw this error so I can write a test?
Sorry all your last message told me...

@larsoner
Copy link
Member

larsoner commented Nov 4, 2019

I can't pick a default meas_date like (0,0) beacuse then the subject bday won't be anonymized. I guess I could pick a random time stamp.

Is this the only date-related thing you'd need to anonymize if info['meas_date'] is already None?

The thing to do might be to do everything relative to Jan 1 1970 (the epoch). info['meas_date'] = None actually gets stored as a variant of this (0, 2 ** 32 - 1) since you would never store an actual date this way (the 2**32- microseconds would be represented as part of the seconds field) so it's even somewhat consistent with that.

But I haven't followed the anonymization discussion too closely so I'm not sure.

@bloyl
Copy link
Contributor Author

bloyl commented Nov 4, 2019

The problem is that any default meas_date will give a default timeshift to subject birthday. Which would be reversible.

I guess if meas_date is None then birthday is useless (since I can't compute age) so maybe I should set the birthday to something nonsensical if meas_date is None

@larsoner
Copy link
Member

larsoner commented Nov 4, 2019

Yes, probably also None would/should work. Or just del the key. It seems like consistent behavior at least.

@alexrockhill
Copy link
Contributor

The problem is that any default meas_date will give a default timeshift to subject birthday. Which would be reversible.

I guess if meas_date is None then birthday is useless (since I can't compute age) so maybe I should set the birthday to something nonsensical if meas_date is None

Having daysback be a required argument could solve the issue of having a known relation to birthday. Then the meas_date could be set at daysback and the age could be related to that while still being anonymized. I think that works....

@alexrockhill
Copy link
Contributor

The problem is that any default meas_date will give a default timeshift to subject birthday. Which would be reversible.
I guess if meas_date is None then birthday is useless (since I can't compute age) so maybe I should set the birthday to something nonsensical if meas_date is None

Having daysback be a required argument could solve the issue of having a known relation to birthday. Then the meas_date could be set at daysback and the age could be related to that while still being anonymized. I think that works....

See https://github.com/mne-tools/mne-bids/pull/280/files for what I was thinking

if 'daysback' not in anonymize:
            raise ValueError('`daysback` argument required to anonymize.')
        daysback = anonymize['daysback']
        if (datetime.now().date() - timedelta(days=daysback)).year > 1900:
            min_secondsback = (datetime.now().date() -
                               date(year=1900, month=1, day=1)).total_seconds()
            min_daysback = int(np.ceil(min_secondsback / (60 * 60 * 24)))
            raise ValueError('According to BIDS specifications, the ' +
                             'anonymization time has to be before 1900, ' +
                             'daysback given %i, ' % daysback +
                             'minimum acceptable daysback %i' % min_daysback)

@agramfort
Copy link
Member

@alexrockhill @bloyl ready to merge? if so LGTM

@bloyl
Copy link
Contributor Author

bloyl commented Nov 5, 2019

No, there are still a few things I need to tweak. Hopefully on my slate for this afternoon/tomorrow.

@bloyl
Copy link
Contributor Author

bloyl commented Nov 6, 2019

@larsoner @agramfort @jasmainak @alexrockhill
In fif files info['meas_date'], info['file_id']['secs'] and info['meas_id']['secs'] all represent seconds since the unix epoch and are all stored as np.dtype('>i4') which is int32.

this means that the earliest meas_date we can store is

In [3]: _stamp_to_dt( (np.iinfo('>i4').min, 0))                                                                                                                                              
Out[3]: datetime.datetime(1901, 12, 13, 20, 45, 52, tzinfo=datetime.timezone.utc)

and the latest is

In [4]: _stamp_to_dt( (np.iinfo('>i4').max, 0))                                                                                                                                              
Out[4]: datetime.datetime(2038, 1, 19, 3, 14, 7, tzinfo=datetime.timezone.utc)

I've modified info._check_consistency to check and error for some of these issues.

This has immediate ramifications for mne-tools/mne-bids#280 which wants anonymized dates before 1900. But also gives an idea of when the fif file standard will need to be updated to something more robust.

@jasmainak
Copy link
Member

humm interesting, 2038 is not so far off ...

we could modify the BIDS specification recommendations based on this. 1900 was a rather arbitrary choice but now these issues can help guide a better number

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

Grr, 32-bit signed integers.

Rather than change the spec it might be worth talking over on fiff-constants about how to handle this. For example if they allow us to add a LONG LONGLONG (64-bit int) datatype, and allow us to alternatively store the date using this datatype, then it solves the problem for us.

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

Or LONGLONG, rather

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

... actually there is already a 64-bit signed int primitive:

https://github.com/mne-tools/fiff-constants/blob/master/DictionaryTypes.txt#L118

So we just need to ask if it's okay to change meas_date to allow storing in this format.

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

Pursuing some FIF format change in mne-tools/fiff-constants#24

@bloyl
Copy link
Contributor Author

bloyl commented Nov 6, 2019

I don't know enough about the fiff internals to really weigh in what the best course of action might be. So I leave it up to you folks.

I do think its a big undertaking to just support being able to anonymize dates to pre1900 though.

@agramfort
Copy link
Member

agramfort commented Nov 6, 2019 via email

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

I would not push a change in the fif standard for this.

Maybe not for this, but there will be a problem in 20 years anyway, so why not fix it now? :)

@jasmainak
Copy link
Member

@larsoner is this going to break compatibility with MNE-C and FieldTrip? Many folks here partially analyze their data in mne-python and partially in mne-c

@larsoner
Copy link
Member

larsoner commented Nov 6, 2019

FT we can update by updating MNE-MATLAB, and MNE-C can be updated, too. But yes it would cause these problems

% (key, key_2,
repr(np.iinfo('>i4').min),
repr(np.iinfo('>i4').max),
repr(value[key_2]),))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need the repr() here

'and "%r", got "%r"'
% (repr((np.iinfo('>i4').min, 0)),
repr((np.iinfo('>i4').max, 0)),
repr(self['meas_date']),))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as elsewhere. You probably don't need the repr if you use %r


try:
info._check_consistency()
except RuntimeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid try-except clause somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a change that avoids this but seems abit kludgy.

Either is fine with me...

@jasmainak
Copy link
Member

@bloyl does the whats_new page need to be updated?

@bloyl
Copy link
Contributor Author

bloyl commented Nov 11, 2019

@jasmainak How do I check that my changes to whats_new page render correctly? can i do that check locally?

meas_date : tuple | None
The Info object you want to use for overwriting values
in target Info objects.
dt : datetime.timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename dt to td. dt suggests it is datetime.datatime and not datetime.timedelta

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better delta_t to match naming below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bloyl can you just take care of this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be done now

delta_t = meas_date_datetime - default_anon_dos
# compute timeshift delta
if daysback is None and info['meas_date'] is None:
delta_t = datetime.timedelta(days=np.random.randint(365, 45 * 365))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would not do this. It means the file you generate will be different if you call the function twice. this is error prone. I see already a version control system commit a huge file for nothing. I would enforce daysback to be not None or set the date to 1970-01-01 as if date we removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with making daysback a required parameter is that it will encourage users to just pick a number (and likely the same number for all data). When in most cases people don't have longitudinal data and shouldn't need/want daysback at all. To properly use daysback the site needs to use a different number for each subject but the same number for datasets from the same subject.

The problem with picking a date for meas_date is None cases is that then we would always be using that same daysback which is just a constant timeshift of potentially identifiable information (birthday, device info, proc history etc)

In the latest round of edits. I just remove all the date/time info i know about if 'meas_date is None' I think this is robust in that you'll get the same info out each time and that it will be anonymous. I also don't think it loses any important time interval data.

value['machid'][:] = 0

# exp 2 tests the keep_his option
exp_info_2 = exp_info.copy()
exp_info_2['subject_info']['his_id'] = 'foobar'

# exp 3 tests is a supplied daysback
dt = timedelta(days=43)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use dt for a timedelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jasmainak
Copy link
Member

How do I check that my changes to whats_new page render correctly? can i do that check locally?

you can open the documentation built by circleci by clicking this:

alternatively, locally you can run:

$ make html-noplot

in the doc/ folder

@agramfort
Copy link
Member

@bloyl do you want me to push the suggestions I made?

@mne-tools mne-tools deleted a comment from jasmainak Nov 12, 2019
@mne-tools mne-tools deleted a comment from jasmainak Nov 12, 2019
@mne-tools mne-tools deleted a comment from bloyl Nov 12, 2019
meas_date : tuple | None
The Info object you want to use for overwriting values
in target Info objects.
dt : datetime.timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bloyl can you just take care of this one?

mne/io/meas_info.py Outdated Show resolved Hide resolved
mne/io/meas_info.py Outdated Show resolved Hide resolved
mne/io/meas_info.py Outdated Show resolved Hide resolved
mne/io/meas_info.py Outdated Show resolved Hide resolved
remove str additions

Co-Authored-By: Alexandre Gramfort <[email protected]>
Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx @bloyl

@jasmainak or @larsoner feel free to merge if happy.

@jasmainak
Copy link
Member

One of the travis builds had failed due to time limit being exceeded. I restarted it.

@jasmainak jasmainak merged commit bb8cd69 into mne-tools:master Nov 13, 2019
@jasmainak
Copy link
Member

Looks fine, thanks @bloyl !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants