Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find states only at GFP peaks #163

Closed
Mattlab-hub opened this issue Feb 17, 2024 · 5 comments
Closed

Find states only at GFP peaks #163

Mattlab-hub opened this issue Feb 17, 2024 · 5 comments

Comments

@Mattlab-hub
Copy link

HI, this is somewhat related to #160 because I am working with MEG data, but I am trying to take the transition matrices using only the GFP peaks in an effort to exclude the likely quick artifactually transitions in the GFP troughs. Before I reinvent the wheel I wanted to check and see if there was a simple way to do this besides saving the seg time indices of interest as a different variable and then rewriting the transition matrix code from scratch to work on the new variable.
So far I have identified the peak indices in seg in each epoch using this:
std=np.std(epoched._data, axis=1) peaks=[] for e in std: peaks.append(scipy.signal.find_peaks(e)[0])
This outputs essentially gives me the index in seg.labels where ModK.labels_ match up when ModK is fit with the GFP. I now realize I now realize ModK.labels_ has the info I need to apply into the transition matrix function.

As a side note related to #160, this is obveously done with STD and not RMS. I tried RMS but the timepoints end up not matching up due to ModK.lables_ and peaks having a different number of peaks when peaks is done with np.sqrt(np.mean(epoched._data**2, axis=1)) instead of np.std(epoched._data, axis=1). Though when I flatten and plot both over the seg.plot(), both match up pretty well, though STD matches up better.

Best,
Matt

@vferat
Copy link
Owner

vferat commented Feb 17, 2024

Hey Matt,

I think your approach is correct.
However, everything is a bit trickier because you are working with Epochs, so you want to make sure to avoid taking into account transitions happening between 2 epochs (i.e. transition from the last gfp of an epoch to the first gfp of the next epoch) which is not implemented in the current state of pycrostates.segmentation.compute_transition_matrix[doc].

You could try introducing unlabeled labels ( -1) in the segmentation between each epoch as transition from/to unlabeled timepoints are ignored:

import numpy as np
import scipy.signal
from pycrostates.cluster import ModKMeans
from pycrostates.segmentation.transitions import compute_transition_matrix

...
import numpy as np
import scipy.signal
from pycrostates.segmentation.transitions import compute_transition_matrix

...

n_clusters = 5
ModK = ModKMeans(n_clusters=n_clusters, random_state=42)
ModK.fit(epochs, n_jobs=5)

labels = []
epochs_data = epochs.get_data()
n_sample_per_epoch = epochs_data.shape[2]

for e, epoch in enumerate(epochs_data):
    epoch_gfp = np.std(epoch, axis=0) # compute gfp
    epoch_peaks = scipy.signal.find_peaks(epoch_gfp, distance=3)[0] # find peak location (in the current epoch)
    epoch_peaks += e * n_sample_per_epoch # change the peak location to the global index
    epoch_peaks_labels = ModK.labels_[epoch_peaks].astype(int) # get the label of the peaks
    labels.extend(epoch_peaks_labels)
    labels.append(-1) # add -1 at the end of each epoch to make sure compute_transition_matrix ignore transitions between epochs

labels = np.array(labels)
compute_transition_matrix(labels, ModK.n_clusters)

It become way more complicated if you want to fit the KMeans with gfp peaks only. In this case, In this case, I would recommend extracting gfp peaks yourself and create a ChData structure yourself that you will use for fittinh, so you can keep track of which peak belong to which epoch.

Another point to take into account is that this approach works well because ignore_repetitions is set to True by default in compute_transition_matrix. Because you are fitting on peaks only, there is no state durations per say.
If you want to take into account state durations, you can assign the label of each gfp peak to its neighboring timepoints: given timepoints between 2 consecutives gfp peaks, you can assign the first half of timepoints to the same label of the 1rst peak and the second half of the timepoints to the same label as the 2nd gfp peaks. It is what is usually do in other packages when you specify that you want to compute the segmentation on gfp peaks only.

Thank you for the feedback on GFP computation, we will investigate this next week and let you know if we plan any change that could resolve some of your MEG issues.

With this approach, you can easily chose between 'std' or 'rms' for GFP computation.

@Mattlab-hub
Copy link
Author

Hi @vferat, Thanks that code, It works great. I was previously only fitting with the GFP because it ran a lot quicker, but it is not necessary so long as I can extract the peaks this way.
As a shortcut, if one was not interested in knowing which peaks belonged to which epochs (such as in a resting state study where this info is unimportant), would it be viable to just do:
T_observed = compute_transition_matrix(ModK.labels_, ModK.n_clusters)
When ModK is fitted with the GFP instead of epochs? or is there going to be a big issue with epoch starts and stops?

Also for reference, I ran the HCP MEG data with both STD and RMS, and compared it to the regular seg.plot() and here are the outputs for one epoch, but the pattern is consistent across epochs:
image
image
image
The HCP data is just magnetometers, but it does not seem to make a large difference using STD or RMS to extract the GFP since reference free MEG and average referenced EEG essentially both create zero mean data. However, I have noticed that you cannot work across them, since the slight differences between RMS and STD create a slightly different number of peaks which throws off the indexing between them. So, currently only np.std(epoched.data, axis=1) matches up with ModK.labels, when ModK is found with the GFP. The np.sqrt(np.mean(epoched.data**2, axis=1)) does not match up with ModK.lables.

@vferat
Copy link
Owner

vferat commented Feb 19, 2024

Hi @vferat, Thanks that code, It works great. I was previously only fitting with the GFP because it ran a lot quicker, but it is not necessary so long as I can extract the peaks this way. As a shortcut, if one was not interested in knowing which peaks belonged to which epochs (such as in a resting state study where this info is unimportant), would it be viable to just do: T_observed = compute_transition_matrix(ModK.labels_, ModK.n_clusters) When ModK is fitted with the GFP instead of epochs? or is there going to be a big issue with epoch starts and stops?

If some Epochs are dropped and/or Epochs are overlapping and/or if there is some delay between 2 Epochs then you cannot be 100% sure that there is time continuity between two Epochs. It is an issue when working with transitions because you don't want to take into account transitions happening between 2 epochs because of these time continuity issues. That is why I added the -1 value the code, to make sure transition happening between Epochs are ignored in the matrix computations ( -1 is the label for unlabeled datapoints and also transitions from/to this state are ignored)
So the general answer is yes, there is an issue using the approach.

However, if you a sure that there is time continuity (I can only think of mne.make_fixed_length_epochs with overlap=0, with no or very few epochs dropped) then you approach could be used.

@Mattlab-hub
Copy link
Author

Thank you @vferat, Good point! The HCP data I cannot be sure of time continuity since I opted to use their minimally preprocessed data, so I do not know what they threw out. Though I do have a raw EEG dataset as well that is of good quality so I do not have to throw out any epochs, and did define epochs with a fixed length and zero overlap. Thank you again for the code and the insight, it was extremely helpful.

@Mattlab-hub
Copy link
Author

Hi @vferat , sorry to open this up again, but I wanted to run this on the individuals level analysis and the group level analysis too. I realized this code only works for individuals but not groups since the group shares the same ModK structure. I am guessing that to modify this pipeline the best thing would be to do the group level ModK, then segment everyone individually and use that individual segmentation within the one line of the loop where you have the ModK.labels_[epoch_peaks].astype(int) to something like seg.labels[e][epoch_peaks].astype(int). Though the only way to get this running was to also remove the line: epoch_peaks += e * n_sample_per_epoch. this seems to work and mostly passes the checks, but sometimes the -1 boarders double up, such that there will be two in a row. I am not completely sure why this happens, but I am guessing it has to do with the rejected edges from segmentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants