Performance improvements #87

harryliu-intel · 2021-11-21T21:12:58Z

A few performance improvements:

replace probe() plus busy waiting with select()
use a specialized function to compare messages of enum constants
use multiprocessing semaphore to avoid overhead of pipe

awintel · 2021-11-21T22:10:50Z

Oh, oh... that means work for me. Not knowing what you would be doing. I have also refactored the LoihiPyProcessModel by pushing a lot of the generic MGMT command handling into the AbstractPyProcModel since this is not Loihi specific and should also be the same for AsyncProcs e.g. on a CPU.

For that purpose I introduced a message handler based on a dictionary mapping from commands to function objects/message handlers. The AbstractPyProcModel sets up a basic dictionary with RUN, PAUSE, STOP and respective message handlers. Sub classes like LoihiPyProcModel would then just extend the set of handlers. This made the code much cleaner.

Will first finish my stuff then revisit your changes.

What in all of your changes was the key to making it more performant and by how much?

mgkwill · 2021-11-21T23:10:15Z

I like the enum_equal fucntion and using multiprocessing semaphore also seems like a good idea.

This change also has the likelihood of affect current inflight Pyport changes but will adapt as necessary.

harryliu-intel · 2021-11-21T23:58:21Z

Sorry, Andreas, I didn't know you'd be working in the same area. The most significant speed up comes from the select() commit, but I think each of the other 2 commits also makes a measurable difference.

awintel · 2021-11-22T00:16:50Z

No worries. Not your fault ;-)

awintel

Took me a bit to understand what's happening here...

My interpretation is the following:

You introduced a CspSelector which essentially has a threading.Condition object.
CspSelector.select is like a message handler of its own. It is given a list of channels and actions to perform when there is a token on any of the given channels.
The select method passes the central Condition of the CspSelector to each channel.
The CspSelector then simply waits for any channel to trigger the condition. This Condition.wait() is probably implemented more efficiently than busy waiting.
Once any channel has a new token it sends a notification via the threading.Condition to all observers. This unblocks the wait() inside the CspSelector.select() method. We iterate once over all all channels to see which has as token and then call the corresponding action/message handler.

Overall this looks good!
I don't mind if we pull this into the current release if this all works but going forward I hope we can simplify the overall design of the PyRuntimeService and PyProcessModel as outlined in the other git issue:
#86

If possible, we should treat all channels equally, have one selector that listens on all available channels and calls an appropriate action when any becomes active. The action could just be a function pointer instead of a string in some cases or port in some other case.

awintel · 2021-11-22T05:47:59Z

src/lava/magma/core/model/py/model.py

        while True:
-            # Probe if there is a new command from the runtime service
-            if self.service_to_process_cmd.probe():
+            if action == 'cmd':
                phase = self.service_to_process_cmd.recv()


Ok, so one key change is that instead of constantly calling probe(), we immediately block on the command channel until we receive anything.

awintel · 2021-11-22T05:49:19Z

src/lava/magma/core/model/py/model.py

                else:
                    raise ValueError(f"Wrong Phase Info Received : {phase}")
+            elif action == 'req':


My idea was to treat everything as a command. May it be RUN, PAUSE, STOP, GET, SET, SPK, ... and just handle them all consistently.

awintel · 2021-11-22T06:09:35Z

src/lava/magma/core/model/py/model.py

+                        if isinstance(csp_port, CspRecvPort):
+                            channel_actions.append((csp_port, lambda: var_port))
+            elif enum_equal(phase, PyLoihiProcessModel.Phase.HOST):
+                channel_actions.append((self.service_to_process_req,


From a type perspective it is a bit inconsistent to give either a string or a port back as an action.

src/lava/magma/compiler/channels/pypychannel.py

awintel · 2021-11-22T06:26:47Z

src/lava/magma/compiler/channels/pypychannel.py

+        callable and return the result.
+        """
+        with self._cv:
+            self._set_observer(args, self._changed)


Why do we have to set and unset this all the time?

In an earlier version, all channels notified, and _changed() determined if the source channel is part of the current set being selected on. (This is also the reason for the now unused channel argument, which I didn't bother to remove.) I thought doing it this way was cheaper and cleaner.

awintel · 2021-11-22T06:35:16Z

src/lava/magma/compiler/channels/pypychannel.py

@@ -230,7 +235,10 @@ def _req_callback(self):
        try:
            while not self._done:
                self._req.recv_bytes(0)
+                not_empty = self.probe()


This loop is still busy waiting, isn't it? Is this a performance concern? Could this also be improved in performance by using a condition?

There no busy waiting here, because self._req.recv_bytes() would have blocked.

harryliu-intel

The action could just be a function pointer instead of a string in some cases or port in some
other case.

I agree but I avoided that because it would have required larger structural changes that might make this commit more difficult to understand and review. I would suggest the original authors come back and refactor it.

harryliu-intel · 2021-11-22T07:32:23Z

src/lava/magma/compiler/channels/pypychannel.py

+        callable and return the result.
+        """
+        with self._cv:
+            self._set_observer(args, self._changed)


In an earlier version, all channels notified, and _changed() determined if the source channel is part of the current set being selected on. (This is also the reason for the now unused channel argument, which I didn't bother to remove.) I thought doing it this way was cheaper and cleaner.

harryliu-intel · 2021-11-22T07:34:31Z

src/lava/magma/compiler/channels/pypychannel.py

@@ -230,7 +235,10 @@ def _req_callback(self):
        try:
            while not self._done:
                self._req.recv_bytes(0)
+                not_empty = self.probe()


There no busy waiting here, because self._req.recv_bytes() would have blocked.

PhilippPlank

Overall I think we should merge it, since it improves performance and does not break our unit tests. We still need to refactor our runtime/runtime_service interactions, but one step after the other.

src/lava/magma/runtime/mgmt_token_enums.py

* Use specialized np.array_equal for performance. * Use select instead of probe with busy waiting. * Use multiprocessing semaphore instead of pipe.

harryliu-intel added 1-feature New feature request 0-needs-review For all new issues labels Nov 21, 2021

harryliu-intel requested review from PhilippPlank, awintel and ashishrao7 November 21, 2021 21:21

harryliu-intel linked an issue Nov 22, 2021 that may be closed by this pull request

Something is embarrassingly slow (with current message passing backend?) #36

Closed

awintel reviewed Nov 22, 2021

View reviewed changes

harryliu-intel commented Nov 22, 2021

View reviewed changes

PhilippPlank approved these changes Nov 22, 2021

View reviewed changes

harryliu-intel added 3 commits November 22, 2021 08:57

Use specialized np.array_equal for performance.

66eae9b

Use select instead of probe with busy waiting.

8471f3f

Use multiprocessing semaphore instead of pipe.

ec1c463

harryliu-intel force-pushed the perf branch from a7e492a to ec1c463 Compare November 22, 2021 17:03

mgkwill approved these changes Nov 22, 2021

View reviewed changes

src/lava/magma/runtime/mgmt_token_enums.py Show resolved Hide resolved

mgkwill merged commit 38e14d5 into lava-nc:main Nov 22, 2021

ashishrao7 mentioned this pull request Nov 24, 2021

Speed of QP solver is very slow lava-nc/lava-optimization#16

Closed

monkin77 pushed a commit to monkin77/thesis-lava that referenced this pull request Jul 12, 2024

Performance improvements (lava-nc#87)

b294baa

* Use specialized np.array_equal for performance. * Use select instead of probe with busy waiting. * Use multiprocessing semaphore instead of pipe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #87

Performance improvements #87

harryliu-intel commented Nov 21, 2021

awintel commented Nov 21, 2021

mgkwill commented Nov 21, 2021

harryliu-intel commented Nov 21, 2021

awintel commented Nov 22, 2021

awintel left a comment

awintel Nov 22, 2021

awintel Nov 22, 2021

awintel Nov 22, 2021

awintel Nov 22, 2021

harryliu-intel Nov 22, 2021

awintel Nov 22, 2021

harryliu-intel Nov 22, 2021

harryliu-intel left a comment

harryliu-intel Nov 22, 2021

harryliu-intel Nov 22, 2021

PhilippPlank left a comment

Performance improvements #87

Performance improvements #87

Conversation

harryliu-intel commented Nov 21, 2021

awintel commented Nov 21, 2021

mgkwill commented Nov 21, 2021

harryliu-intel commented Nov 21, 2021

awintel commented Nov 22, 2021

awintel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harryliu-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilippPlank left a comment

Choose a reason for hiding this comment