clarification on expected behavior of inference parallelism #198

sph001 · 2022-05-18T11:33:20Z

sph001
May 18, 2022

Hello,

My scenario is that I am using 2 models to asynchronously run inference on a stereoscopic camera setup. Each model is initialized and operates in its own thread where synchronized frames are fed for inference.

Now my intention is that the models would run in parallel to minimize delay of further processing from the result of both threads, but despite the separation, the results come back as if the models were run synchronously.

Now I'm still debugging and it's very possible this is my mistake, but I figured it would be best to go to the source and ask if this is expected behavior before I really dive into it.

Factors:
While running, gpu utilization and memory are both under 15%. (So it's not a resource restriction).
There is no variation in the architecture or input for both models. They simply have different weights.

Expected: when benchmarking parallel synchronized inference, we would expect O(N) computational time where N represents the cost of running on a single frame.

Observed behavior: benchmark results suggest O(XN) where X represents the number of models being run. The results also suggest synchronous inference where models A and B have the same starting time, A completes in O(N) and B completed in O(2N).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarification on expected behavior of inference parallelism #198

{{title}}

Replies: 0 comments

Select a reply

clarification on expected behavior of inference parallelism #198

sph001 May 18, 2022

Replies: 0 comments

sph001
May 18, 2022