Reorganize user documentation and update parameters

jstapmanns · Dec 15, 2023 · 9cc78b9 · 9cc78b9
1 parent 0d1f255
commit 9cc78b9
Show file tree

Hide file tree

Showing 6 changed files with 223 additions and 92 deletions.
diff --git a/models/eprop_iaf_psc_delta.h b/models/eprop_iaf_psc_delta.h
@@ -50,6 +50,9 @@ Description
 neuron model with delta-shaped postsynaptic currents used for eligibility
 propagation (e-prop) plasticity.
 
+An additional state variable and the corresponding differential
+equation represents a piecewise constant external current.
+
 .. note::
   Contrary to what the model names suggest, ``eprop_iaf_psc_delta`` is not simply
   the ``iaf_psc_delta`` model endowed with e-prop. While both models are
@@ -66,15 +69,15 @@ The membrane voltage time course is given by:
              + \sum_i W_{ji}^\mathrm{in}x_i^t-z_j^{t-1}v_\mathrm{th} \,, \\
     \alpha &= e^{-\frac{\delta t}{\tau_\mathrm{m}}} \,.
 
-The spike state variable is given by:
+The spike state variable is expressed by a Heaviside function:
 
 .. math::
     z_j^t = H\left(v_j^t-v_\mathrm{th}\right) \,.
 
 If the membrane voltage crosses the threshold voltage :math:`v_\text{th}`, a spike is
-emitted and the membrane voltage is reduced by :math:`v_\text{th}` in the next time step.
-Counted from the time step of the spike emission, the neuron is not able to
-spike for an absolute refractory period :math:`t_\text{ref}`.
+emitted and the membrane voltage is reduced by :math:`v_\text{th}` in the next
+time step. After the time step of the spike emission, the neuron is not
+able to spike for an absolute refractory period :math:`t_\text{ref}`.
 
 An additional state variable and the corresponding differential equation
 represents a piecewise constant external current.
@@ -89,6 +92,40 @@ plasticity is calculated:
 See the documentation on the ``iaf_psc_delta`` neuron model for more information
 on the integration of the subthreshold dynamics.
 
+The change of the synaptic weight is calculated from the gradient
+:math:`\frac{\mathrm{d}{E}}{\mathrm{d}{W_{ij}}}=g`
+which depends on the presynaptic
+spikes :math:`z_i^{t-1}`, the surrogate-gradient / pseudo-derivative of the postsynaptic membrane
+voltage :math:`\psi_j^t` (which together form the eligibility trace
+:math:`e_{ji}`), and the learning signal :math:`L_j^t` emitted by the readout
+neurons.
+
+.. math::
+  \frac{\mathrm{d}E}{\mathrm{d}W_{ji}} = g &= \sum_t L_j^t \bar{e}_{ji}^t, \\
+   e_{ji}^t &= \psi^t_j \bar{z}_i^{t-1}\,, \\
+
+The eligibility trace and the presynaptic spike trains are low-pass filtered
+with some exponential kernels:
+
+.. math::
+  \bar{e}_{ji}=\mathcal{F}_\kappa(e_{ji}) \;\text{with}\, \kappa=\exp\left(\frac{-\delta t}{\tau_\text{m,
+out}}\right)\,,\\ \bar{z}_i=\mathcal{F}_\alpha(z_i) \;\text{with}\, \alpha=\exp\left(\frac{-\delta
+t}{\tau_\text{m}}\right)\,.
+
+Furthermore, a firing rate regularization mechanism keeps the average firing
+rate :math:`f^\text{av}_j` of the postsynaptic neuron close to a target firing rate
+:math:`f^\text{target}`:
+
+.. math::
+  \frac{\mathrm{d}E^\text{reg}}{\mathrm{d}W_{ji}} = g^\text{reg} = c_\text{reg}
+  \sum_t \frac{1}{Tn_\text{trial}} \left( f^\text{target}-f^\text{av}_j\right)e_{ji}^t\,,
+
+whereby :math:`c_\text{reg}` scales the overall regularization and the average
+is taken over the time that passed since the previous update, that is, the number of
+trials :math:`n_\text{trial}` times the duration of an update interval :math:`T`.
+
+The overall gradient is given by the addition of the two gradients.
+
 For more information on e-prop plasticity, see the documentation on the other e-prop models:
 
     * :doc:`eprop_iaf_psc_delta_adapt<../models/eprop_iaf_psc_delta_adapt/>`
@@ -251,7 +288,7 @@ class eprop_iaf_psc_delta : public EpropArchivingNodeRecurrent
     //! Target firing rate of rate regularization (spikes/s).
     double f_target_;
 
-    //! Scaling of pseudo-derivative of membrane voltage.
+    //! Scaling of surrogate-gradient / pseudo-derivative of membrane voltage.
     double gamma_;
 
     //! Constant external input current (pA).

diff --git a/models/eprop_iaf_psc_delta_adapt.h b/models/eprop_iaf_psc_delta_adapt.h
@@ -50,12 +50,15 @@ Description
 neuron model with delta-shaped postsynaptic currents and threshold adaptation
 used for eligibility propagation (e-prop) plasticity.
 
+An additional state variable and the corresponding differential
+equation represents a piecewise constant external current.
+
 .. note::
   Contrary to what the model names suggest, ``eprop_iaf_psc_delta_adapt`` is not
   simply the ``iaf_psc_delta`` model with adaptation endowed with e-prop. While
-  both models are integrate-and-fire neurons with delta-shaped post-synaptic
+  both models are integrate-and-fire neurons with delta-shaped postsynaptic
   currents, there are minor differences in the dynamics of the two models, such
-  as the propagator of the post-synaptic current and the voltage reset upon a
+  as the propagator of the postsynaptic current and the voltage reset upon a
   spike.
 
 E-prop plasticity was originally introduced and implemented in TensorFlow in [1]_.
@@ -79,9 +82,9 @@ The spike state variable is expressed by a Heaviside function:
 .. math::
     z_j^t = H\left(v_j^t-A_j^t\right) \,.
 
-If the membrane voltage crosses the adaptive threshold voltage, a spike is
+If the membrane voltage crosses the adaptive threshold voltage :math:`A_j^t`, a spike is
 emitted and the membrane voltage is reduced by :math:`v_\text{th}` in the next
-time step. Counted from the time step of the spike emission, the neuron is not
+time step. After the time step of the spike emission, the neuron is not
 able to spike for an absolute refractory period :math:`t_\text{ref}`.
 
 An additional state variable and the corresponding differential equation
@@ -97,6 +100,42 @@ plasticity is calculated:
 See the documentation on the ``iaf_psc_delta`` neuron model for more information
 on the integration of the subthreshold dynamics.
 
+The change of the synaptic weight is calculated from the gradient
+:math:`\frac{\mathrm{d}{E}}{\mathrm{d}{W_{ij}}}=g`
+which depends on the presynaptic
+spikes :math:`z_i^{t-1}`, the surrogate-gradient / pseudo-derivative of the postsynaptic membrane
+voltage :math:`\psi_j^t` (which together form the eligibility trace
+:math:`e_{ji}`), and the learning signal :math:`L_j^t` emitted by the readout
+neurons.
+
+.. math::
+  \frac{\mathrm{d}E}{\mathrm{d}W_{ji}} = g &= \sum_t L_j^t \bar{e}_{ji}^t, \\
+  e_{ji}^t &= \psi_j^t \left(\bar{z}_i^{t-1} - \beta \epsilon_{ji,a}^{t-1}\right)\,, \\
+  \epsilon^{t-1}_{ji,\text{a}} &= \psi_j^{t-1}\bar{z}_i^{t-2} + \left( \rho - \psi_j^{t-1} \beta \right)
+  \epsilon^{t-2}_{ji,a}\; \text{with}\,\rho &= \exp\left(-\frac{\delta t}{\tau_\text{a}}\right)\,. \\
+
+The eligibility trace and the presynaptic spike trains are low-pass filtered
+with some exponential kernels:
+
+.. math::
+  \bar{e}_{ji}=\mathcal{F}_\kappa(e_{ji}) \;\text{with}\, \kappa=\exp\left(\frac{-\delta t}{\tau_\text{m,
+out}}\right)\,,\\ \bar{z}_i=\mathcal{F}_\alpha(z_i) \;\text{with}\, \alpha=\exp\left(\frac{-\delta
+t}{\tau_\text{m}}\right)\,.
+
+Furthermore, a firing rate regularization mechanism keeps the average firing
+rate :math:`f^\text{av}_j` of the postsynaptic neuron close to a target firing rate
+:math:`f^\text{target}`:
+
+.. math::
+  \frac{\mathrm{d}E^\text{reg}}{\mathrm{d}W_{ji}} = g^\text{reg} = c_\text{reg}
+  \sum_t \frac{1}{Tn_\text{trial}} \left( f^\text{target}-f^\text{av}_j\right)e_{ji}^t\,,
+
+whereby :math:`c_\text{reg}` scales the overall regularization and the average
+is taken over the time that passed since the previous update, that is, the number of
+trials :math:`n_\text{trial}` times the duration of an update interval :math:`T`.
+
+The overall gradient is given by the addition of the two gradients.
+
 For more information on e-prop plasticity, see the documentation on the other e-prop models:
 
     * :doc:`eprop_iaf_psc_delta<../models/eprop_iaf_psc_delta/>`
@@ -272,7 +311,7 @@ class eprop_iaf_psc_delta_adapt : public EpropArchivingNodeRecurrent
     //! Target firing rate of rate regularization (spikes/s).
     double f_target_;
 
-    //! Scaling of pseudo-derivative of membrane voltage.
+    //! Scaling of surrogate-gradient / pseudo-derivative of membrane voltage.
     double gamma_;
 
     //! Constant external input current (pA).

diff --git a/models/eprop_learning_signal_connection.h b/models/eprop_learning_signal_connection.h
@@ -42,7 +42,8 @@ Description
 
 ``eprop_learning_signal_connection`` is a feedback connector  from ``eprop_readout``
 neurons to ``eprop_iaf_psc_delta`` or ``eprop_iaf_psc_delta_adapt`` neurons and
-transmits the learning signals for e-prop plasticity [1]_.
+transmits the learning signals :math:`L_j^t` for e-prop plasticity [1]_ and has
+a static weight :math:`B_{jk}`.
 
 For more information on e-prop plasticity, see the documentation on the other e-prop models:
 

diff --git a/models/eprop_readout.h b/models/eprop_readout.h
@@ -48,17 +48,40 @@ Description
 
 ``eprop_readout`` is an implementation of a integrate-and-fire neuron model
 with delta-shaped postsynaptic currents used as readout neuron for e-prop plasticity [1]_.
+
 An additional state variable and the corresponding differential
 equation represents a piecewise constant external current.
 
+E-prop plasticity was originally introduced and implemented in TensorFlow in [1]_.
+
+The membrane voltage time course is given by:
+
 .. math::
-    v_j^t &= \alpha v_j^{t-1}+\sum_{i \neq j}W_{ji}^\mathrm{rec}z_i^{t-1}
-              + \sum_i W_{ji}^\mathrm{in}x_i^t-z_j^{t-1}v_\mathrm{th} \\
-    \alpha &= e^{-\frac{\delta t}{\tau_\mathrm{m}}}
+    v_j^t &= \alpha v_j^{t-1}+\sum_{i \neq j}W_{ji}^\mathrm{out}z_i^{t-1}
+             -z_j^{t-1}v_\mathrm{th} \,, \\
+    \alpha &= e^{-\frac{\delta t}{\tau_\mathrm{m}}} \,.
 
 See the documentation on the ``iaf_psc_delta`` neuron model for more information
 on the integration of the subthreshold dynamics.
 
+The change of the synaptic weight is calculated from the gradient
+:math:`\frac{\mathrm{d}{E}}{\mathrm{d}{W_{ij}}}=g`
+which depends on the presynaptic
+spikes :math:`z_i^{t-1}` and the learning signal :math:`L_j^t` emitted by the readout
+neurons.
+
+.. math::
+  \frac{\mathrm{d}E}{\mathrm{d}W_{ji}} = g &= \sum_t L_j^t \bar{z}_i^{t-1}\,. \\
+
+The presynaptic spike trains are low-pass filtered with an exponential kernel:
+
+.. math::
+  \bar{z}_i=\mathcal{F}_\kappa(z_i) \;\text{with}\, \kappa=\exp\left(\frac{-\delta t}{\tau_\text{m}}\right)\,.
+
+Since readout neurons are leaky integrators without a spiking mechanism, the
+formula for computing the gradient lacks the surrogate gradient /
+pseudo-derivative and a firing regularization term.
+
 For more information on e-prop plasticity, see the documentation on the other e-prop models:
 
     * :doc:`eprop_iaf_psc_delta<../models/eprop_iaf_psc_delta/>`

diff --git a/models/eprop_synapse.h b/models/eprop_synapse.h
@@ -44,65 +44,13 @@ Description
 +++++++++++
 
 ``eprop_synapse`` is a connector to create e-prop synapses between postsynaptic
-neurons :math:`j` and presynaptic neurons and :math:`i` as defined in [1]_.  The
-change of the synaptic weight :math:`\Delta W_{ji}` depends on the presynaptic
-spikes :math:`z_i^{t-1}`, the pseudo-derivative of the postsynaptic membrane
-voltage :math:`\psi_j^t` (which together form the eligibility trace
-:math:`e_{ji}`), and the learning signal :math:`L_j^t` emitted by the readout
-neurons. The eligibility trace and the presynaptic spike trains are
-low-pass-filtered with kernels :math:`\mathcal{F}_\kappa` with
-:math:`\kappa=\exp\left(\frac{-\delta t}{\tau_\text{out}}\right)` and
-:math:`\mathcal{F}_\alpha` with
-:math:`\kappa=\exp\left(\frac{-\delta t}{\tau_\text{m, out}}\right)`.
-The overall weight update is scaled by the learning rate :math:`\eta`. The
-general formula computing weight updates for eprop synapses projecting onto
-recurrent neurons are thus given by:
-
-.. math::
-  \Delta W_{ji}^\text{rec} &= -\eta \sum_t L_j^t \bar{e}_{ji}^t \\
-   &= -\eta \sum_t L_j^t \mathcal{F}_\kappa \left( \psi^t_j \bar{z}_i^{t-1}\right) \\
-   &= -\eta \sum_t L_j^t \sum_{t'\leq t} \kappa^{t-t'} \psi^t_j \mathcal{F}_\alpha\left( z_i^{t-1}\right)\,.
-
-If the postsynaptic neuron is an adaptive neuron, next to the membrane voltage, a
-second hidden state variable, the threshold adaptation, is present, which
-changes the eligibility trace:
-
-.. math::
-  e_{ji}^t &= \psi_j^t \left(\bar{z}^{t-1} - \beta \epsilon_{ji,a}^{t-1}\right)\,, \\
-  \epsilon^{t-1}_{ji,\text{a}} &= \psi_j^{t-1}\bar{z}_i^{t-2} + \left( \rho - \psi_j^{t-1} \beta \right)
-  \epsilon^{t-2}_{ji,a}\,, \\ \rho &= \exp\left(-\frac{\delta t}{\tau_\text{a}}\right)\,.
-
-Furthermore, a firing rate regularization mechanism keeps the average firing
-rate :math:`f^\text{av}_j` of the postsynaptic neuron close to a target firing rate
-:math:`f^\text{target}`.
-
-.. math::
-  \Delta W_{ji}^\text{reg} = \eta c_\text{reg}
-  \sum_t \frac{1}{Tn_\text{trial}} \left( f^\text{target}-f^\text{av}_j\right)e_{ji}^t,
-
-whereby :math:`c_\text{reg}` scales the overall regularization and the average
-is taken over the time that passed since the previous update, that is, the number of
-trials :math:`n_\text{trials}` times the duration of an update interval :math:`T`.
-
-The overall recurrent weight update is given by adding :math:`\Delta W_{ji}^\text{rec}`
-and :math:`\Delta W_{ji}^\text{reg}`.
-
-Since readout neurons :math:`k` are leaky integrators without a spiking
-mechanism, the formula for computing weight updates for synapses projecting onto
-readout neurons lacks the pseudo derivative and a firing regularization term.
-
-.. math::
-  \Delta W_{kj}^\text{out} = -\eta \sum_t L_j^t \mathcal{F}_\kappa \left(z_j^t\right).
-
-The weights can also be optimized with the Adam scheme [2]_:
-
-.. math::
-  m_0 &= 0, v_0 = 0, t = 1 \\
-  m_t &= \beta_1 \cdot m_{t-1} + \left(1-\beta_1\right) \cdot g_t \\
-  v_t &= \beta_2 \cdot v_{t-1} + \left(1-\beta_2\right) \cdot g_t^2 \\
-  \hat{m}_t &= \frac{m_t}{1-\beta_1^t} \\
-  \hat{v}_t &= \frac{v_t}{1-\beta_2^t} \\
-  \Delta W &= - \eta\frac{\hat{m_t}}{\sqrt{\hat{v}_t} + \epsilon}
+neurons :math:`j` and presynaptic neurons and :math:`i` as defined in [1]_.
+
+The e-prop synapse collects the presynaptic spikes needed for calculating the
+weight update. When it is time to update, it triggers the calculation of the
+gradient which is specific to the post-synaptic neuron and is thus defined there.
+
+Evntually, it optimizes the weight with the specified optimizer.
 
 E-prop synapses require archiving of continuous quantities. Therefore e-prop
 synapses can only be connected to neuron models that are capable of doing this
@@ -116,7 +64,11 @@ For more information on e-prop plasticity, see the documentation on the other e-
     * :doc:`eprop_synapse<../models/eprop_synapse/>`
     * :doc:`eprop_learning_signal_connection<../models/eprop_learning_signal_connection/>`
 
-Details on the event-based NEST implementation of e-prop can be found in [3]_.
+For more information on the optimizers, see the documentation of the weight optimizer:
+
+    * :doc:`weight_optimizer<../models/weight_optimizer/>`
+
+Details on the event-based NEST implementation of e-prop can be found in [2]_.
 
 .. warning::
 
@@ -135,14 +87,7 @@ The following parameters can be set in the status dictionary.
 ------------------------------------------------------------------------------------------------------------------------
 Parameter         Unit      Math equivalent   Default          Description
 ================  ========  ================  ================ =========================================================
-adam_beta1                  :math:`\beta_1`   0.9              Exponential decay rate for first moment estimate of Adam
-                                                               optimizer
-adam_beta2                  :math:`\beta_2`   0.999            Exponential decay rate for second moment estimate of Adam
-                                                               optimizer
-adam_epsilon                :math:`\epsilon`  1e-8             Small constant for numerical stability of Adam optimizer
-batch_size                                    1                Size of batch
-optimizer                                     gradient_descent Optimizer. If adam, use Adam optimizer, if gd,
-                                                               gradient descent
+optimizer                                     {'type': 'gradient_descent} Dictionary of optimizer parameters
 average_gradient                              False            If True, average the gradient over the learning window
 ================  ========  ================  ================ =========================================================
 
@@ -151,14 +96,9 @@ average_gradient                              False            If True, average
 ------------------------------------------------------------------------------------------------------------------------
 Parameter      Unit  Math equivalent            Default  Description
 =============  ====  =========================  =======  ===============================================================
-adam_m               :math:`m`                      0.0  Initial value of first moment estimate m of Adam optimizer
-adam_v               :math:`v`                      0.0  Initial value of second moment raw estimate v of Adam optimizer
 delay          ms    :math:`d_{ji}`                 1.0  Dendritic delay
-eta                  :math:`\eta`                  1e-4  Learning rate
 tau_m_readout  ms    :math:`\tau_\text{m,out}`      0.0  Time constant for low-pass filtering of eligibility trace
 weight         pA    :math:`W_{ji}`                 1.0  Synaptic weight
-Wmax           pA    :math:`W_{ji}^\text{max}`    100.0  Maximal value for synaptic weight
-Wmin           pA    :math:`W_{ji}^\text{min}`      0.0  Minimal value for synaptic weight
 =============  ====  =========================  =======  ===============================================================
 
 Recordables
@@ -190,11 +130,7 @@ References
        networks of spiking neurons. Nature Communications, 11:3625.
        https://doi.org/10.1038/s41467-020-17236-y
 
-.. [2] Kingma DP, Ba JL (2015). Adam: A method for stochastic optimization.
-       Proceedings of International Conference on Learning Representations (ICLR).
-       https://doi.org/10.48550/arXiv.1412.6980
-
-.. [3] Korcsak-Gorzo A, Stapmanns J, Espinoza Valverde JA, Dahmen D,
+.. [2] Korcsak-Gorzo A, Stapmanns J, Espinoza Valverde JA, Dahmen D,
        van Albada SJ, Bolten M, Diesmann M. Event-based implementation of
        eligibility propagation (in preparation)