Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent connectivity hangs on execution #100

Closed
mathisrichter opened this issue Nov 23, 2021 · 5 comments · Fixed by #112
Closed

Recurrent connectivity hangs on execution #100

mathisrichter opened this issue Nov 23, 2021 · 5 comments · Fixed by #112
Assignees
Labels
1-bug Something isn't working

Comments

@mathisrichter
Copy link
Contributor

When creating a simple recurrent network, for instance a LIF that connects to itself via Dense, execution hangs indefinitely.

import unittest
import numpy as np

from lava.magma.core.run_conditions import RunSteps
from lava.magma.core.run_configs import Loihi1SimCfg
from lava.proc.lif.process import LIF
from lava.proc.dense.process import Dense


class TestRecurrentNetwork(unittest.TestCase):
    def test_running_recurrent_network(self):
        """Tests executing an architecture with a recurrent
        connection."""
        num_steps = 10
        shape = (1,)
    
        bias = np.zeros(shape)
        bias[:] = 5000
        lif = LIF(shape=shape, bias=bias, bias_exp=np.ones(shape))
    
        dense = Dense(weights=np.ones((1, 1)))
    
        lif.s_out.connect(dense.s_in)
        dense.a_out.connect(lif.a_in)
    
        lif.run(condition=RunSteps(num_steps=num_steps),
                run_cfg=Loihi1SimCfg())
        lif.stop()
    
        self.assertEqual(lif.runtime.current_ts, num_steps)
@mathisrichter mathisrichter added 1-bug Something isn't working help needed Extra attention is needed 0-needs-review For all new issues labels Nov 23, 2021
@mathisrichter mathisrichter added this to the Release v0.2.0 milestone Nov 23, 2021
@mathisrichter
Copy link
Contributor Author

Is this because both run_spk() methods of the PyProcessModels of LIF and Dense do a recv() as first operation, which blocks until a message comes in - thus blocking the entire network?

From PyDenseModel:

    def run_spk(self):
        s_in = self.s_in.recv()
        a_out = self.weights[:, s_in].sum(axis=1)
        self.a_out.send(a_out)
        self.a_out.flush()

From PyLifModelFloat:

    def run_spk(self):
        a_in_data = self.a_in.recv()
        self.u[:] = self.u * (1 - self.du)
        self.u[:] += a_in_data
        bias = self.bias * (2**self.bias_exp)
        self.v[:] = self.v * (1 - self.dv) + self.u + bias
        s_out = self.v >= self.vth
        self.v[s_out] = 0  # Reset voltage to 0
        self.s_out.send(s_out)

@ashishrao7
Copy link
Collaborator

Did you look into this issue where I had a similar problem #32?

@mathisrichter
Copy link
Contributor Author

I wasn't aware of this, thanks Ashish. How did you end up solving it? Could that solution be applied here?

@ashishrao7
Copy link
Collaborator

ashishrao7 commented Nov 23, 2021

Yes. Andreas' pointer solved the issue. Due to the recurrent nature of the connections, there was an InPort that ended up waiting indefinitely for a signal. I was able to solve it by changing the order of recv() and send() in the run_spk function of one of my process models involved in the recurrent connection. Depending on what behavior you want you'll have to change the order of send() and recv() in one of your process models above

@awintel
Copy link
Contributor

awintel commented Nov 23, 2021

Both LIF and DENSE seem to be wrong here:

  1. DENSE can receive a spike, accumulate it but the a_out that it produces in one time step cannot be the same a_out that it sends out in the same time step. Dense must replicate the circular a_out buffer that our NeuroCores have with at least a history of 1.
  2. LIF is wrong because it first first tries to recv() something which will block indefinitely if connected but nothing is sent upstream.

Either DENSE should first send the buffered a_out from the previous time steps and only accumulate a_out for future time steps. Or LIF should first update its neurons and then grab and buffer the a_out for the next time step.

It probably makes sense to implement the DendAccum activation circular buffer that we have in Loihi right now (even if we don't support explicit synaptic delays at the moment) because we need at least an implicit delay of 1 to not deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-bug Something isn't working
Projects
None yet
5 participants