-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acquisition date anonymization #538
Comments
As Guio mentioned, my suggestion is to remove the specific date recommendation and simply add a flag like "AnonymizedAcqTime = true" in dataset_description.json. |
yes, I think that's a nice solution. I think it's also worth mentioning that fif files do form a big market in the MEG community (not sure what percentage), not just a format that needs to be fixed. I believe there are hurdles for that to happen any time soon. |
Can somebody explain in detail how that would help, please? |
The problem is fif files cannot have dates before 1901. If you remove the restriction of 1925, it makes it much easier to anonymize. You only need to anonymize up to a year rather than figure out a crazy big number that is not too big. |
You no longer have to look at the year to know if it's a real date or not. And it removes the unnecessary restriction. You can now have a real or anonymized date of 2000-01-01 for example. |
Ahhh okay thanks! Let me rephrase:
Sounds good to me in principle. Let's collect more opinions and if we converge, we can do a small PR! |
tl;dr This discussion seems to have focused on a technical solution to a communication problem. Of the three options presented by @guiomar, only # 2 (Add a warning on the specs to indicate that earlier dates may crash FIF files when working with M/EEG) seems to address the problem. So, if I'm reading the original thread correctly, the problem is that a dataset might anonymize to a pre-1901 date. If the data needs to be converted to FIF (or FIF-encoded M/EEG data added in a later session), then it may be impossible to correctly encode dates across the dataset without changing all of the dates. If this is correct, the solution does not seem to match the problem. Asserting anonymization occurred doesn't modify the need to shift the anonymization dates post-1901 if there's any risk a dataset or its derivatives might include FIF data. And for datasets yet to be collected or curated, the thing that needs to happen is that curators need to encode dates post-1901, regardless of any anonymization assertion flag. It has always been possible to shift dates into some other range, and anonymization is simply something the dataset user has to trust. The idea with the pre-1925 dates is that it is impossible to have collected the data on those dates, and therefore it is unambiguous and directly validatable. Putting |
Thanks for your thoughts @effigies ! I disagree that # 2 is necessarily the best solution. This is because it assumes that everyone who works with BIDS data reads the specification. More often than not, people use some software for anonymization, thus keeping this weird 1925 date means every MEG software that does anonymization for the purposes of sharing with the BIDS standard will have to add warnings and checks to prevent users from using pre-1901 dates. Note that HIPAA does not require year to be anonymized and for this reason many packages implement the anonymization in terms of number of days to shift rather than via a specific date. This in turn means that we have to calculate the number of days until 1925, or alternatively change how anonymization is done. All this seems too much of a hassle when we should be in fact making it easier for folks to share data. My final argument is that group studies can have a range of dates and it may be necessary to keep the relative times. In that case, one has to figure out the number of days that takes you back up to 1925 but not further than 1901 for each recording and then take their minimum/maximum. Note that dates are not the only thing one has to anonymize, thus we still have to go the "trust me" route even with the 1925 criteria. |
Anonymization does not imply shifting dates, it is not required by all ethics boards, and it is only recommended in BIDS. So it has value to me to specify this. By your logic, having the year shifted very early would also be of no value. Of course, you are correct that to directly address the issue, a warning could be added wherever we talk about shifting dates. |
I pushed a commit in #546 that should take care of a proper "warning". Please review |
While the warning is taken care of, any objection to going ahead with an explicit flag instead of an early date recommendation? |
Both the name of the "key" in the JSON but also the potential values need to be considered. I see multiple options:
There are multiple ways that the date and/or time can be modified:
I don't think that these cases allow themselves to be described properly with a single key-value pair like |
Oh, and I don't think that There can be a separate |
Sure, but I wouldn't go into too much detail. I think for most cases a simple note that the dates are shifted will alert the users not to conclude too much from them. I see little use in knowing date vs time details for example. Of course, within subject relative timing can be important as is already noted in the spec. We could have a few values: Regarding where to put the key, can it be in scans.json? I thought those files can only contain a description of the columns. It could be mentioned in the acq_time Description field, but that wouldn't be standardized. Is it possible to have the following?
If so, I like that solution. |
I very much like this idea. This would also enable us to lift the "pre-1925" limitation, as by looking at
I'm a little confused. What would "Global" imply, and why do you think it would only be relevant in hyperscanning?
The current spec says:
Therefore,
this should very well be possible. I like this solution very much (albeit I still need to understand the meaning of the value @Moo-Marc proposed!), and adopting the spec accordingly would still be backward-compatible with existing anonymized datasets (i.e. if What do you all think about this? If we can reach a consensus, I would start drafting a proposal for a BIDS standard amendment. |
You can include additional columns in the
where you should interpret "fields" as "columns". Column headings must be in snake_case (whereas in the json files BIDS uses CamelCase) and the columns can be explained in more detail in a json file that accompanies the tsv. So the
and the corresponding
Although you could also drop the
Using the inheritance principle, you could choose to put a single |
I like this. 👌 |
solution via _scans.tsv is ok for me. 2 remarks:
|
that will be possible starting with the next release, see latest spec:
The validator has already been adjusted |
The current specs say:
So it's already only a SHOULD, not a MUST. I believe it's only an implementation detail in MNE-BIDS that we enforce pre-1925 dates for anonymization. Therefore, I believe we already have to trust users to do the right thing 👍 and the changes proposed here would actually improve the situation. I think we could just keep this sentence, capitalize "SHOULD", and your issue would be addressed, right?
|
all good then on my side
… |
see: 9b1ae95 |
Great, thanks! |
@hoechenberger These were meant to refer to the shift value itself, not simply whether they are shifted or not. Global means the shift is the same throughout the dataset, per subject means the same shift is applied within a subject, per scan means the shift is different for each file. So for one global constant shift, I see this as being useful only in hyperscanning where you may want to be able to confirm which participants were scanned together. In any other case, I don't see the use of knowing the relative dates between participants. While the extra column idea of @robertoostenveld works, I think most cases won't have a mix of anonymized and non-anonymized scans. I therefore prefer a way to specify it globally, and something standardized, not just written in the acq_time field description. |
Adding to the chorus here that a solution along these lines is desirable. Just FYI there is an issue with the 1925 also in EDF (not just FIF). Raised in #698, this issue also came up with EDF files, where the file format itself prevents anyone from shifting dates to < 1985. Link here related chat with Teuniz (maintainer of EDFBrowser): https://gitlab.com/Teuniz/EDFbrowser/-/issues/26https://gitlab.com/Teuniz/EDFbrowser/-/issues/26 |
Hi everyone! Shall we create a PR to deal with this following the changes suggested by @robertoostenveld ? I also like a lot the values suggested by @Moo-Marc, and they could be easily integrated in the suggestion
|
I second that though unfortunately I wouldn't have the time to do it soon. |
Hi!
After discussion at:
mne-tools/mne-bids-pipeline#160
From the BIDS specs:
Summarizing:
Since currently the spec remains a bit open, and people could still use earlier than 1925 dates, we should at least warn in the specs, that this may produce errors when working with FIF files (e.g. for M/EEG).
Alternatively, talking to Marc he suggested to use an additional field which indicate wether a dated was randomized. This will potentially end these discussions around the acq dates issue. I think this is an elegant solution. I wonder if there is a reason why it has not been already implemented?
The possible solutions I see are:
What do you think?
@hoechenberger @jasmainak @agramfort @sappelhoff @alexrockhill @robertoostenveld @Moo-Marc
The text was updated successfully, but these errors were encountered: