Use masked arrays in computations #44

tyarkoni · 2020-06-30T13:47:21Z

Related to #9, we should add support for masked arrays wherever possible—this will allow vectorized estimation even when the studies in parallel datasets differ (i.e., users pass in NaN values in different studies for different datasets).

tsalo · 2020-07-10T16:09:15Z

I just want to keep track of the things I've noticed that need to be changed/account for in this:

In pymare.core.Dataset, _get_predictors() converts X to a DataFrame. DataFrames don't work with masked arrays. They automatically fill the missing values with nans.
In pymare.stats.ensure_2d() (called by Dataset during initialization), data are converted to arrays, removing the masks from masked arrays.
In pymare.stats.weighted_least_squares(), np.einsum seems to drop masking, although I can't be sure since einsum seems to be magic.
When calculating the dot product of two arrays with np.dot(arr1, arr2), the resulting array will be a masked array with no mask.
When calculating the dot product of two arrays with arr1.dot(arr2), masking only seems to preserved if the first array is the one that is masked. It's weird.

tyarkoni · 2020-07-16T16:44:05Z

Thanks, this list is helpful. For most if not all of the above, working with masked operations shouldn't be too hard. E.g., while einsum won't natively do any masking, I think we can just pass a masking array as one of the operands, and multiplying by the mask in the summation will then produce the desired result.

That said, if it does look like working with masked arrays is going to require major changes, we might have to bite the bullet and just return NaN for any voxels that have missing values. But hopefully it won't come to that.

tyarkoni · 2021-05-13T23:30:10Z

I was wrong, it's not straightforward. Will leave open, but doubt I'll be able to work on it.

HippocampusGirl · 2022-06-17T10:03:14Z

An alternative to using masked arrays would be to call the statistics code separately for each voxel, filtering the input matrices to remove missing data. I have a working example at https://github.com/HALFpipe/HALFpipe/blob/main/halfpipe/stats/fit.py.

tsalo · 2022-06-20T16:35:59Z

I think the problem with looping across voxels would be that the estimation is vectorized, so it can work across many voxels at the same time. I think switching to looping would slow things down, unless we divided the data into groups of voxels, based on patterns of missing data, and looped across those groups.

tsalo mentioned this issue Jul 17, 2020

Fill missing data in images with NaNs neurostuff/NiMARE#274

Closed

JulioAPeraza mentioned this issue Dec 1, 2023

Support liberal mask in IBMA estimators neurostuff/NiMARE#848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use masked arrays in computations #44

Use masked arrays in computations #44

tyarkoni commented Jun 30, 2020

tsalo commented Jul 10, 2020

tyarkoni commented Jul 16, 2020

tyarkoni commented May 13, 2021

HippocampusGirl commented Jun 17, 2022

tsalo commented Jun 20, 2022

Use masked arrays in computations #44

Use masked arrays in computations #44

Comments

tyarkoni commented Jun 30, 2020

tsalo commented Jul 10, 2020

tyarkoni commented Jul 16, 2020

tyarkoni commented May 13, 2021

HippocampusGirl commented Jun 17, 2022

tsalo commented Jun 20, 2022