You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My scenario is that I am using 2 models to asynchronously run inference on a stereoscopic camera setup. Each model is initialized and operates in its own thread where synchronized frames are fed for inference.
Now my intention is that the models would run in parallel to minimize delay of further processing from the result of both threads, but despite the separation, the results come back as if the models were run synchronously.
Now I'm still debugging and it's very possible this is my mistake, but I figured it would be best to go to the source and ask if this is expected behavior before I really dive into it.
Factors:
While running, gpu utilization and memory are both under 15%. (So it's not a resource restriction).
There is no variation in the architecture or input for both models. They simply have different weights.
Expected: when benchmarking parallel synchronized inference, we would expect O(N) computational time where N represents the cost of running on a single frame.
Observed behavior: benchmark results suggest O(XN) where X represents the number of models being run. The results also suggest synchronous inference where models A and B have the same starting time, A completes in O(N) and B completed in O(2N).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
My scenario is that I am using 2 models to asynchronously run inference on a stereoscopic camera setup. Each model is initialized and operates in its own thread where synchronized frames are fed for inference.
Now my intention is that the models would run in parallel to minimize delay of further processing from the result of both threads, but despite the separation, the results come back as if the models were run synchronously.
Now I'm still debugging and it's very possible this is my mistake, but I figured it would be best to go to the source and ask if this is expected behavior before I really dive into it.
Factors:
While running, gpu utilization and memory are both under 15%. (So it's not a resource restriction).
There is no variation in the architecture or input for both models. They simply have different weights.
Expected: when benchmarking parallel synchronized inference, we would expect O(N) computational time where N represents the cost of running on a single frame.
Observed behavior: benchmark results suggest O(XN) where X represents the number of models being run. The results also suggest synchronous inference where models A and B have the same starting time, A completes in O(N) and B completed in O(2N).
Beta Was this translation helpful? Give feedback.
All reactions