Skip to content

6. More on template matching

carsen-stringer edited this page Mar 3, 2024 · 2 revisions

NOTE: This Wiki is kept here for reference for previous versions of Kilosort. For updated information about Kilosort4, please refer to kilosort.readthedocs.io instead.


Here we explain the mechanism by which Kilosort 1&2 find overlapping spikes. The method is often referred to as "template matching" or "matching pursuit", and it is similar between Kilosort 1 and 2. It differs from the classical sequential approach of spike sorting which proceeds roughly as:

  1. detect threshold crossings.
  2. extract PCA features from each spike.
  3. run an iterative clustering algorithm

Kilosort integrates these steps into a single optimization procedure, and adds a fourth step:

Step 4:

... use the cluster centroids to go back to the raw data and run template matching with convolutional matching pursuit. Roughly, this starts by sweeping all templates (another name for the mean waveform) over the voltage data in time. Thus the "convolutional" part. Spikes are detected where the templates best match the data. The best match is then "peeled off" from the raw data, by subtracting a scaled version of the template corresponding to that unit. If the subtraction is perfect, there should be no trace left of that spike, and we can sequentially proceed to find the next best matching spike. The reason we had to peel off all the detections is so we can potentially detect other spikes nearby the spot we peeled off. To correctly detect such overlapping, or colliding spikes, we need to have good models of the templates, and correctly estimate each of the spike's amplitude.

Since step 4 has assigned new spikes to the templates, we can modify the templates to better account for these new spikes. This naturally gives rise to an iterative optimization, where the templates are continuously improved to better account for the detected spikes, and the spike times are re-optimized in light of the better templates. This is similar to the standard iterative optimization during clustering, as in the classical pipelines, but it loops in the raw data and the template matching step to resolve overlap ambiguities that would otherwise confuse the clustering steps.

Unlike Kilosort1, Kilosort2 now runs the full template matching algorithm during the optimization phase, making the optimization and final extraction of the spikes identical. This is possible in Kilosort2 due to efficiency upgrades, the same upgrades which make it in principle possible to run it on data from thousands of channels.

Avoiding local minima and setting the number of clusters

Like all clustering algorithms, Kilosort2 has a local minimum problem and it is difficult to set the number of clusters. Here we also adopted a different strategy compared to Kilosort1. At some iteration of the algorithm, we start with an active set of N templates.

  1. We run the full template matching algorithm with these templates.
  2. We subtract off all the templates that we matched to spikes. Any remaining spikes in the residual are likely to be from neurons that were not in our active set of templates.
  3. We use our common spike detector based on threshold crossings to find all spikes unaccounted for, left in the residual.
  4. Of these spikes, we introduce as new templates the M biggest ones in their range of channels, and make sure there are no duplicates from the same neuron.
  5. We now have N + M templates, and we repeat steps 1-5 continuously, moving on to the next batches.
  6. On every 5-th batch, we discard those templates that failed to capture even a few spikes.

In practice, the turnover rate of introducing and dropping templates is very high. On the first iteration, we start with no templates at all, and perform the steps above starting at step 3.