You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to replicate the results from your benchmarks using docker gpu with this docker image nvidia/cuda:12.6.1-cudnn-devel-ubuntu22.04 and nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04. After I install all the requirements I cannot get close to your results. The best times are around 98ms for the small model.
I used different GPUs, like: tesla-t4, a10-gpu, a100 and h100, no significant improvement
Depth anything V2 small 518x518: last results of 100 iterations in seconds
Inference: 0.0989875050000002
Inference: 0.09922104800000042
Inference: 0.10035349300000007
Inference: 0.09872460100000069
Inference: 0.09940112099999965
Inference: 0.10043860399999716
I dont know if the benchmarks you have here are from depth-anything v1.
In any case, could you share benchmarks fro Depth anything V2?
Reference from the readme.md
We observe the following average latencies using the CUDA Execution Provider:
Device
Encoder
Input Shape
Average Latency (ms)
RTX4080 12GB
ViT-S
(1, 3, 518, 518)
13.3
RTX4080 12GB
ViT-B
(1, 3, 518, 518)
29.3
RTX4080 12GB
ViT-L
(1, 3, 518, 518)
83.2
The text was updated successfully, but these errors were encountered:
ORT will output a trace JSON file that you can open in, for example, https://www.ui.perfetto.dev/
This is the trace file I obtain when running the ViT-S model on my machine under Ubuntu WSL (it's around 70MB so I zipped it): onnxruntime_profile__vits.zip
Discounting the first few runs (warm-up) and averaging over the latter runs, I get 12983208.33 nanoseconds or 12.98 milliseconds:
Honestly, I'm not sure where the issue is. Did you try running inference directly (without Docker)?
I tried to replicate the results from your benchmarks using docker gpu with this docker image nvidia/cuda:12.6.1-cudnn-devel-ubuntu22.04 and nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04. After I install all the requirements I cannot get close to your results. The best times are around 98ms for the small model.
I used different GPUs, like: tesla-t4, a10-gpu, a100 and h100, no significant improvement
Depth anything V2 small 518x518: last results of 100 iterations in seconds
Inference: 0.0989875050000002
Inference: 0.09922104800000042
Inference: 0.10035349300000007
Inference: 0.09872460100000069
Inference: 0.09940112099999965
Inference: 0.10043860399999716
I dont know if the benchmarks you have here are from depth-anything v1.
In any case, could you share benchmarks fro Depth anything V2?
Reference from the readme.md
We observe the following average latencies using the CUDA Execution Provider:
(1, 3, 518, 518)
(1, 3, 518, 518)
(1, 3, 518, 518)
The text was updated successfully, but these errors were encountered: