Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #87

Merged
merged 3 commits into from
Nov 22, 2021
Merged

Performance improvements #87

merged 3 commits into from
Nov 22, 2021

Conversation

harryliu-intel
Copy link
Contributor

A few performance improvements:

  • replace probe() plus busy waiting with select()
  • use a specialized function to compare messages of enum constants
  • use multiprocessing semaphore to avoid overhead of pipe

@harryliu-intel harryliu-intel added 1-feature New feature request 0-needs-review For all new issues labels Nov 21, 2021
@awintel
Copy link
Contributor

awintel commented Nov 21, 2021

Oh, oh... that means work for me. Not knowing what you would be doing. I have also refactored the LoihiPyProcessModel by pushing a lot of the generic MGMT command handling into the AbstractPyProcModel since this is not Loihi specific and should also be the same for AsyncProcs e.g. on a CPU.

For that purpose I introduced a message handler based on a dictionary mapping from commands to function objects/message handlers. The AbstractPyProcModel sets up a basic dictionary with RUN, PAUSE, STOP and respective message handlers. Sub classes like LoihiPyProcModel would then just extend the set of handlers. This made the code much cleaner.

Will first finish my stuff then revisit your changes.

What in all of your changes was the key to making it more performant and by how much?

@mgkwill
Copy link
Contributor

mgkwill commented Nov 21, 2021

I like the enum_equal fucntion and using multiprocessing semaphore also seems like a good idea.

This change also has the likelihood of affect current inflight Pyport changes but will adapt as necessary.

@harryliu-intel
Copy link
Contributor Author

Sorry, Andreas, I didn't know you'd be working in the same area. The most significant speed up comes from the select() commit, but I think each of the other 2 commits also makes a measurable difference.

@awintel
Copy link
Contributor

awintel commented Nov 22, 2021

No worries. Not your fault ;-)

Copy link
Contributor

@awintel awintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a bit to understand what's happening here...

My interpretation is the following:

  • You introduced a CspSelector which essentially has a threading.Condition object.
  • CspSelector.select is like a message handler of its own. It is given a list of channels and actions to perform when there is a token on any of the given channels.
  • The select method passes the central Condition of the CspSelector to each channel.
  • The CspSelector then simply waits for any channel to trigger the condition. This Condition.wait() is probably implemented more efficiently than busy waiting.
  • Once any channel has a new token it sends a notification via the threading.Condition to all observers. This unblocks the wait() inside the CspSelector.select() method. We iterate once over all all channels to see which has as token and then call the corresponding action/message handler.

Overall this looks good!
I don't mind if we pull this into the current release if this all works but going forward I hope we can simplify the overall design of the PyRuntimeService and PyProcessModel as outlined in the other git issue:
#86

If possible, we should treat all channels equally, have one selector that listens on all available channels and calls an appropriate action when any becomes active. The action could just be a function pointer instead of a string in some cases or port in some other case.

while True:
# Probe if there is a new command from the runtime service
if self.service_to_process_cmd.probe():
if action == 'cmd':
phase = self.service_to_process_cmd.recv()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so one key change is that instead of constantly calling probe(), we immediately block on the command channel until we receive anything.

else:
raise ValueError(f"Wrong Phase Info Received : {phase}")
elif action == 'req':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to treat everything as a command. May it be RUN, PAUSE, STOP, GET, SET, SPK, ... and just handle them all consistently.

if isinstance(csp_port, CspRecvPort):
channel_actions.append((csp_port, lambda: var_port))
elif enum_equal(phase, PyLoihiProcessModel.Phase.HOST):
channel_actions.append((self.service_to_process_req,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a type perspective it is a bit inconsistent to give either a string or a port back as an action.

src/lava/magma/compiler/channels/pypychannel.py Outdated Show resolved Hide resolved
src/lava/magma/compiler/channels/pypychannel.py Outdated Show resolved Hide resolved
callable and return the result.
"""
with self._cv:
self._set_observer(args, self._changed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to set and unset this all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier version, all channels notified, and _changed() determined if the source channel is part of the current set being selected on. (This is also the reason for the now unused channel argument, which I didn't bother to remove.) I thought doing it this way was cheaper and cleaner.

@@ -230,7 +235,10 @@ def _req_callback(self):
try:
while not self._done:
self._req.recv_bytes(0)
not_empty = self.probe()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is still busy waiting, isn't it? Is this a performance concern? Could this also be improved in performance by using a condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There no busy waiting here, because self._req.recv_bytes() would have blocked.

Copy link
Contributor Author

@harryliu-intel harryliu-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action could just be a function pointer instead of a string in some cases or port in some
other case.

I agree but I avoided that because it would have required larger structural changes that might make this commit more difficult to understand and review. I would suggest the original authors come back and refactor it.

callable and return the result.
"""
with self._cv:
self._set_observer(args, self._changed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier version, all channels notified, and _changed() determined if the source channel is part of the current set being selected on. (This is also the reason for the now unused channel argument, which I didn't bother to remove.) I thought doing it this way was cheaper and cleaner.

@@ -230,7 +235,10 @@ def _req_callback(self):
try:
while not self._done:
self._req.recv_bytes(0)
not_empty = self.probe()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There no busy waiting here, because self._req.recv_bytes() would have blocked.

Copy link
Contributor

@PhilippPlank PhilippPlank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think we should merge it, since it improves performance and does not break our unit tests. We still need to refactor our runtime/runtime_service interactions, but one step after the other.

@mgkwill mgkwill merged commit 38e14d5 into lava-nc:main Nov 22, 2021
monkin77 pushed a commit to monkin77/thesis-lava that referenced this pull request Jul 12, 2024
* Use specialized np.array_equal for performance.

* Use select instead of probe with busy waiting.

* Use multiprocessing semaphore instead of pipe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-needs-review For all new issues 1-feature New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Something is embarrassingly slow (with current message passing backend?)
4 participants