-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: Introduce ICA.get_explained_variance_ratio() to easily retrieve relative explained variances after a fit #11141
MRG: Introduce ICA.get_explained_variance_ratio() to easily retrieve relative explained variances after a fit #11141
Conversation
1d8d840
to
f11ccf5
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait a minute this is PCA components explained variance not ICA explained variance !
also ICA components are not orthogonal so the notion of explained variance for a group of components is fishy. At best you can see how much variance you keep by including only component but since all components are not orthogonal to each other the sum will not be 1.
Danger zone...
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I implemented the algorithm used in EEGLAB in this MWE: # %%
from numpy.typing import ArrayLike
import mne
def get_explained_var(
*,
ica: mne.preprocessing.ICA,
inst: mne.io.BaseRaw | mne.Epochs | mne.Evoked,
components: ArrayLike,
) -> float:
# Algorithm follows:
# https://sccn.ucsd.edu/pipermail/eeglablist/2014/009134.html
inst_recon = ica.apply(
inst=raw.copy(),
include=[components],
exclude=[],
n_pca_components=0, # XXX check this
verbose=False
)
diff = inst_recon.get_data() - raw.get_data()
mean_var_diff = diff.var(axis=1, ddof=0).mean()
mean_var_orig = inst.get_data().var(axis=1, ddof=0).mean()
var_explained_ratio = 1 - mean_var_diff / mean_var_orig
return var_explained_ratio
sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'
raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)
raw.pick_types(eeg=True)
ica = mne.preprocessing.ICA(n_components=30, method='picard')
ica.fit(raw)
for components in (0, [0, 1], range(0, 10)):
explained_var = get_explained_var(ica=ica, inst=raw, components=components)
explained_var_percent = round(100 * explained_var, 1)
print(
f'ICA component(s) {components} explain(s): '
f'{explained_var_percent}% of variance in original data'
)
|
Ok this looks correct. Basically it does more than the current private function as it also works with more than one component. If you pass only one component it matches our private function ?
|
Nice!
Should be |
Thanks for the feedback, everybody! New draft, which I will turn into a method of the ICA class soon: from numpy.typing import ArrayLike
import mne
def get_explained_var_ratio(
*,
ica: mne.preprocessing.ICA,
inst: mne.io.BaseRaw | mne.BaseEpochs | mne.Evoked,
components: ArrayLike | int | None,
) -> float:
"""Retrieve the proportion of variance explained by independent components.
A value similar (but not equivalent) to EEGLAB's ``pvaf`` (percent variance
accounted for) will be calculated for the specified component(s).
Parameters
----------
ica : mne.preprocessing.ICA
A fitted ICA instance.
inst : mne.io.BaseRaw | mne.BaseEpochs | mne.Evoked
The uncleaned data.
components : ArrayLike | int | None
The component(s) for which to do the calculation. If more than one
component is specified, explained variance will be calculated jointly
across all supplied components. If ``None``, uses all available
components.
Returns
-------
float
The fraction of variance in ``inst`` that can be explained by the
ICA components.
Notes
-----
Since ICA components cannot be assumed to be aligned orthogonally, the sum
of the proportion of variance explained by all components may not be equal
to 1. In certain edge cases, the proportin of variance explained by a
component may even be negative.
.. versionadded:: 1.1
"""
# The algorithm implemented here should be equivalent to
# https://sccn.ucsd.edu/pipermail/eeglablist/2014/009134.html
# Reconstruct ("back-project") the data using only the specified ICA
# components. Don't make use of potential "spare" PCA components in this
# process – we're only interested in the contribution of the ICA
# components!
if components is None:
components = range(ica.n_components_)
inst_recon = ica.apply(
inst=inst.copy(),
include=[components],
exclude=[],
n_pca_components=0,
verbose=False,
)
data_recon = inst_recon.get_data(picks=ica.ch_names)
data_orig = inst.get_data(picks=ica.ch_names)
data_diff = data_orig - data_recon
# To estimate the data variance, we first compute the variance across
# channels at each time point, and then we average these variances.
mean_var_diff = data_diff.var(axis=0).mean()
mean_var_orig = data_orig.var(axis=0).mean()
var_explained_ratio = 1 - mean_var_diff / mean_var_orig
return var_explained_ratio MWE: import mne
sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'
raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)
raw.pick_types(eeg=True, stim=True)
raw.filter(l_freq=0.1, h_freq=None)
events = mne.find_events(raw)
epochs = mne.Epochs(raw=raw, events=events, baseline=None, preload=True)
evoked = epochs.average()
ica = mne.preprocessing.ICA(n_components=15, method='picard')
ica.fit(raw)
for inst in (raw, epochs, evoked):
print(f'\nProcessing {type(inst)} data…')
for components in (0, [0, 1], None):
explained_var = get_explained_var_ratio(
ica=ica, inst=inst, components=components
)
explained_var_percent = round(100 * explained_var, 1)
print(
f'ICA component(s) {components} explain(s): '
f'{explained_var_percent}% of variance in original data'
) Output:
|
Nice! If the implementation ends up having a logging line, then I suggest adding print(f'\nProcessing {type(inst)} data…') |
Nice idea! |
@hoechenberger let me know when you push this so I can comment on the lines. Will save me time. 🙏 |
19e88c7
to
e22750a
Compare
@agramfort This is ready for review. |
mne/preprocessing/ica.py
Outdated
The fraction of variance in ``inst`` that can be explained by the | ||
ICA components. If only a single ``ch_type`` was given, a float | ||
will be returned. Otherwise, a dictionary with channel types as | ||
keys and explained variance ratios as values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly I would always return a dict as otherwise we'll have for example to branch in the pipeline code etc. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit hesitant to do that because if you have for example only EEG data, you'd need to do:
var_explained = ica.get_explained_variance_ratio(inst)['eeg']
which I felt was kind of odd … but you're of course right, always returning a dict would make things more consistent, so that's probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort I've implemented your suggestion, and we're always returning a dict
now
051f83c
to
8a20f31
Compare
8a20f31
to
bd2ccc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a last question. Otherwise +1 for MRG
thx @hoechenberger
# this process – we're only interested in the contribution of the ICA | ||
# components! | ||
kwargs = dict( | ||
inst=inst.copy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the copy really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes, because we use ica.apply()
below and this works in place. We could do without the copy if instead of using ica.apply()
, we'd "manually" do the matrix multiplication. But ica.apply()
does quite a few additional things and I'm worried I'd forget something important 😅
I trust that if we discover the copy()
here causes issues, we can optimize things later on.
+1 for MRG when CIs complete. thx @hoechenberger |
GH Action status is not reported somehow, but all CI runs for this PR passed, according to https://github.com/mne-tools/mne-python/actions/workflows @larsoner @agramfort Could you please manually merge? |
It's quite common that users want to know the variance explained by all (or individual) ICA components. Such a question just recently showed up on the forum.
The PR implements
ICA.get_explained_variance_ratio()
to allow retrieval of the proportion of variance of the original data explained by ICA components.MWE:
Produces: