Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference benchmarks for V2: Depth anything V2 #26

Open
sarmientoF opened this issue Sep 18, 2024 · 1 comment
Open

Inference benchmarks for V2: Depth anything V2 #26

sarmientoF opened this issue Sep 18, 2024 · 1 comment

Comments

@sarmientoF
Copy link

I tried to replicate the results from your benchmarks using docker gpu with this docker image nvidia/cuda:12.6.1-cudnn-devel-ubuntu22.04 and nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04. After I install all the requirements I cannot get close to your results. The best times are around 98ms for the small model.
I used different GPUs, like: tesla-t4, a10-gpu, a100 and h100, no significant improvement

Depth anything V2 small 518x518: last results of 100 iterations in seconds
Inference: 0.0989875050000002
Inference: 0.09922104800000042
Inference: 0.10035349300000007
Inference: 0.09872460100000069
Inference: 0.09940112099999965
Inference: 0.10043860399999716

I dont know if the benchmarks you have here are from depth-anything v1.

In any case, could you share benchmarks fro Depth anything V2?

Reference from the readme.md

We observe the following average latencies using the CUDA Execution Provider:

Device Encoder Input Shape Average Latency (ms)
RTX4080 12GB ViT-S (1, 3, 518, 518) 13.3
RTX4080 12GB ViT-B (1, 3, 518, 518) 29.3
RTX4080 12GB ViT-L (1, 3, 518, 518) 83.2
@fabio-sim
Copy link
Owner

Hi @sarmientoF, thank you for your interest in Depth-Anything-ONNX.

Firstly, thanks for trying the models out on various GPUs.

Those measurements from the readme are for V2. I measure using ONNXRuntime's profiler:

sess_options = ort.SessionOptions()
sess_options.enable_profiling = True

...

for _ in range(100):
    session.run_with_iobinding(binding)

ORT will output a trace JSON file that you can open in, for example, https://www.ui.perfetto.dev/
This is the trace file I obtain when running the ViT-S model on my machine under Ubuntu WSL (it's around 70MB so I zipped it):
onnxruntime_profile__vits.zip

Discounting the first few runs (warm-up) and averaging over the latter runs, I get 12983208.33 nanoseconds or 12.98 milliseconds:

image

Honestly, I'm not sure where the issue is. Did you try running inference directly (without Docker)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants