Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: Introduce ICA.get_explained_variance_ratio() to easily retrieve relative explained variances after a fit #11141

Merged
merged 17 commits into from
Sep 14, 2022

Conversation

hoechenberger
Copy link
Member

@hoechenberger hoechenberger commented Sep 7, 2022

It's quite common that users want to know the variance explained by all (or individual) ICA components. Such a question just recently showed up on the forum.

The PR implements ICA.get_explained_variance_ratio() to allow retrieval of the proportion of variance of the original data explained by ICA components.

MWE:

import mne

sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'

raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)
raw.pick_types(eeg=True, stim=True)
raw.filter(l_freq=0.1, h_freq=None)

events = mne.find_events(raw)
epochs = mne.Epochs(raw=raw, events=events, baseline=None, preload=True)
evoked = epochs.average()

ica = mne.preprocessing.ICA(n_components=15, method='picard')
ica.fit(raw)

for inst in (raw, epochs, evoked):
    print(f'\nProcessing {type(inst)} data…')
    for components in (0, [0, 1], None):
        explained_var = get_explained_var_ratio(
            ica=ica, inst=inst, components=components
        )
        explained_var_percent = round(100 * explained_var, 1)
        print(
            f'ICA component(s) {components} explain(s): '
            f'{explained_var_percent}% of variance in original data'
        )

Produces:

Processing <class 'mne.io.fiff.raw.Raw'> data…
ICA component(s) 0 explain(s): 26.2% of variance in original data
ICA component(s) [0, 1] explain(s): 29.6% of variance in original data
ICA component(s) None explain(s): 86.1% of variance in original data

Processing <class 'mne.epochs.Epochs'> data…
ICA component(s) 0 explain(s): 46.2% of variance in original data
ICA component(s) [0, 1] explain(s): 61.0% of variance in original data
ICA component(s) None explain(s): 95.2% of variance in original data

Processing <class 'mne.evoked.EvokedArray'> data…
ICA component(s) 0 explain(s): 31.3% of variance in original data
ICA component(s) [0, 1] explain(s): 35.7% of variance in original data
ICA component(s) None explain(s): 93.9% of variance in original data

mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@hoechenberger hoechenberger changed the title Introduce ICA.explained_variance_ to easily retrieve relative explained variances after a fit Introduce ICA.explained_variance_ratio_ to easily retrieve relative explained variances after a fit Sep 7, 2022
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@hoechenberger hoechenberger marked this pull request as ready for review September 7, 2022 15:13
mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@hoechenberger

This comment was marked as outdated.

@hoechenberger

This comment was marked as outdated.

mne/preprocessing/ica.py Outdated Show resolved Hide resolved
@hoechenberger

This comment was marked as outdated.

@hoechenberger hoechenberger changed the title Introduce ICA.explained_variance_ratio_ to easily retrieve relative explained variances after a fit MRG: Introduce ICA.explained_variance_ratio_ to easily retrieve relative explained variances after a fit Sep 7, 2022
Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a minute this is PCA components explained variance not ICA explained variance !

also ICA components are not orthogonal so the notion of explained variance for a group of components is fishy. At best you can see how much variance you keep by including only component but since all components are not orthogonal to each other the sum will not be 1.

Danger zone...

@hoechenberger

This comment was marked as outdated.

@agramfort

This comment was marked as outdated.

@hoechenberger

This comment was marked as outdated.

@hoechenberger hoechenberger marked this pull request as draft September 7, 2022 19:54
@hoechenberger hoechenberger changed the title MRG: Introduce ICA.explained_variance_ratio_ to easily retrieve relative explained variances after a fit Introduce ICA.explained_variance_ratio_ to easily retrieve relative explained variances after a fit Sep 7, 2022
@hoechenberger

This comment was marked as outdated.

@agramfort

This comment was marked as outdated.

@hoechenberger

This comment was marked as outdated.

@hoechenberger

This comment was marked as outdated.

@hoechenberger
Copy link
Member Author

I implemented the algorithm used in EEGLAB in this MWE:

# %%
from numpy.typing import ArrayLike
import mne


def get_explained_var(
    *,
    ica: mne.preprocessing.ICA,
    inst: mne.io.BaseRaw | mne.Epochs | mne.Evoked,
    components: ArrayLike,
) -> float:
    # Algorithm follows:
    # https://sccn.ucsd.edu/pipermail/eeglablist/2014/009134.html
    inst_recon = ica.apply(
        inst=raw.copy(),
        include=[components],
        exclude=[],
        n_pca_components=0,  # XXX check this
        verbose=False
    )
    diff = inst_recon.get_data() - raw.get_data()
    mean_var_diff = diff.var(axis=1, ddof=0).mean()
    mean_var_orig = inst.get_data().var(axis=1, ddof=0).mean()

    var_explained_ratio = 1 - mean_var_diff / mean_var_orig
    return var_explained_ratio


sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'

raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)
raw.pick_types(eeg=True)

ica = mne.preprocessing.ICA(n_components=30, method='picard')
ica.fit(raw)

for components in (0, [0, 1], range(0, 10)):
    explained_var = get_explained_var(ica=ica, inst=raw, components=components)
    explained_var_percent = round(100 * explained_var, 1)
    print(
        f'ICA component(s) {components} explain(s): '
        f'{explained_var_percent}% of variance in original data'
    )
ICA component(s) 0 explain(s): 45.0% of variance in original data
ICA component(s) [0, 1] explain(s): 50.3% of variance in original data
ICA component(s) range(0, 10) explain(s): 77.1% of variance in original data

@agramfort
Copy link
Member

agramfort commented Sep 10, 2022 via email

@drammock
Copy link
Member

Nice!

inst=raw.copy(),

Should be inst=inst.copy() right? Also should use BaseEpochs in the type hint.

@hoechenberger
Copy link
Member Author

hoechenberger commented Sep 11, 2022

Thanks for the feedback, everybody!

New draft, which I will turn into a method of the ICA class soon:

from numpy.typing import ArrayLike
import mne


def get_explained_var_ratio(
    *,
    ica: mne.preprocessing.ICA,
    inst: mne.io.BaseRaw | mne.BaseEpochs | mne.Evoked,
    components: ArrayLike | int | None,
) -> float:
    """Retrieve the proportion of variance explained by independent components.

    A value similar (but not equivalent) to EEGLAB's ``pvaf`` (percent variance
    accounted for) will be calculated for the specified component(s).

    Parameters
    ----------
    ica : mne.preprocessing.ICA
        A fitted ICA instance.
    inst : mne.io.BaseRaw | mne.BaseEpochs | mne.Evoked
        The uncleaned data.
    components : ArrayLike | int | None
        The component(s) for which to do the calculation. If more than one
        component is specified, explained variance will be calculated jointly
        across all supplied components. If ``None``, uses all available
        components.

    Returns
    -------
    float
        The fraction of variance in ``inst`` that can be explained by the
        ICA components.

    Notes
    -----
    Since ICA components cannot be assumed to be aligned orthogonally, the sum
    of the proportion of variance explained by all components may not be equal
    to 1. In certain edge cases, the proportin of variance explained by a
    component may even be negative.

    .. versionadded:: 1.1
    """
    # The algorithm implemented here should be equivalent to
    # https://sccn.ucsd.edu/pipermail/eeglablist/2014/009134.html

    # Reconstruct ("back-project") the data using only the specified ICA
    # components. Don't make use of potential "spare" PCA components in this
    # process – we're only interested in the contribution of the ICA
    # components!
    if components is None:
        components = range(ica.n_components_)

    inst_recon = ica.apply(
        inst=inst.copy(),
        include=[components],
        exclude=[],
        n_pca_components=0,
        verbose=False,
    )
    data_recon = inst_recon.get_data(picks=ica.ch_names)
    data_orig = inst.get_data(picks=ica.ch_names)
    data_diff = data_orig - data_recon

    # To estimate the data variance, we first compute the variance across
    # channels at each time point, and then we average these variances.
    mean_var_diff = data_diff.var(axis=0).mean()
    mean_var_orig = data_orig.var(axis=0).mean()

    var_explained_ratio = 1 - mean_var_diff / mean_var_orig
    return var_explained_ratio

MWE:

import mne

sample_dir = mne.datasets.sample.data_path()
sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'

raw = mne.io.read_raw_fif(sample_fname, preload=True)
raw.crop(tmax=60)
raw.pick_types(eeg=True, stim=True)
raw.filter(l_freq=0.1, h_freq=None)

events = mne.find_events(raw)
epochs = mne.Epochs(raw=raw, events=events, baseline=None, preload=True)
evoked = epochs.average()

ica = mne.preprocessing.ICA(n_components=15, method='picard')
ica.fit(raw)

for inst in (raw, epochs, evoked):
    print(f'\nProcessing {type(inst)} data…')
    for components in (0, [0, 1], None):
        explained_var = get_explained_var_ratio(
            ica=ica, inst=inst, components=components
        )
        explained_var_percent = round(100 * explained_var, 1)
        print(
            f'ICA component(s) {components} explain(s): '
            f'{explained_var_percent}% of variance in original data'
        )

Output:

Processing <class 'mne.io.fiff.raw.Raw'> data…
ICA component(s) 0 explain(s): 26.2% of variance in original data
ICA component(s) [0, 1] explain(s): 29.6% of variance in original data
ICA component(s) None explain(s): 86.1% of variance in original data

Processing <class 'mne.epochs.Epochs'> data…
ICA component(s) 0 explain(s): 46.2% of variance in original data
ICA component(s) [0, 1] explain(s): 61.0% of variance in original data
ICA component(s) None explain(s): 95.2% of variance in original data

Processing <class 'mne.evoked.EvokedArray'> data…
ICA component(s) 0 explain(s): 31.3% of variance in original data
ICA component(s) [0, 1] explain(s): 35.7% of variance in original data
ICA component(s) None explain(s): 93.9% of variance in original data

@drammock
Copy link
Member

Nice! If the implementation ends up having a logging line, then I suggest adding .__name__ to this:

 print(f'\nProcessing {type(inst)} data…')

@hoechenberger
Copy link
Member Author

Nice! If the implementation ends up having a logging line, then I suggest adding .__name__ to this:

 print(f'\nProcessing {type(inst)} data…')

Nice idea!

@agramfort
Copy link
Member

@hoechenberger let me know when you push this so I can comment on the lines. Will save me time. 🙏

@hoechenberger
Copy link
Member Author

@hoechenberger
Copy link
Member Author

@agramfort This is ready for review.

@hoechenberger hoechenberger added this to the 1.2 milestone Sep 13, 2022
@hoechenberger hoechenberger changed the title Introduce ICA.get_explained_variance_ratio() to easily retrieve relative explained variances after a fit MRG: Introduce ICA.get_explained_variance_ratio() to easily retrieve relative explained variances after a fit Sep 13, 2022
The fraction of variance in ``inst`` that can be explained by the
ICA components. If only a single ``ch_type`` was given, a float
will be returned. Otherwise, a dictionary with channel types as
keys and explained variance ratios as values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly I would always return a dict as otherwise we'll have for example to branch in the pipeline code etc. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit hesitant to do that because if you have for example only EEG data, you'd need to do:

var_explained = ica.get_explained_variance_ratio(inst)['eeg']

which I felt was kind of odd … but you're of course right, always returning a dict would make things more consistent, so that's probably better.

@drammock @cbrnr WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort I've implemented your suggestion, and we're always returning a dict now

@hoechenberger
Copy link
Member Author

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a last question. Otherwise +1 for MRG

thx @hoechenberger

# this process – we're only interested in the contribution of the ICA
# components!
kwargs = dict(
inst=inst.copy(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the copy really needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes, because we use ica.apply() below and this works in place. We could do without the copy if instead of using ica.apply(), we'd "manually" do the matrix multiplication. But ica.apply() does quite a few additional things and I'm worried I'd forget something important 😅

I trust that if we discover the copy() here causes issues, we can optimize things later on.

@agramfort agramfort enabled auto-merge (squash) September 14, 2022 09:25
@agramfort
Copy link
Member

+1 for MRG when CIs complete. thx @hoechenberger

auto-merge was automatically disabled September 14, 2022 10:29

Pull request was closed

@hoechenberger hoechenberger reopened this Sep 14, 2022
@hoechenberger
Copy link
Member Author

hoechenberger commented Sep 14, 2022

GH Action status is not reported somehow, but all CI runs for this PR passed, according to https://github.com/mne-tools/mne-python/actions/workflows

@larsoner @agramfort Could you please manually merge?

@hoechenberger hoechenberger enabled auto-merge (squash) September 14, 2022 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants