Best place to host Whisper #327

finebalancetech · 2023-06-30T00:20:11Z

finebalancetech
Jun 30, 2023

For a medical app, I'm looking for either a HIPAA compliant Whisper API endpoint (I believe OpenAI's is not) or a way to self-host (either through self-hosting GPU hardware or through a cloud-based instance like AWS, or Azure). I am wondering what the most cost effective way is to host the large-v2 model?

Does anyone have any performance metrics for different instances running whisper, faster-whisper, or whisper.cpp? For the large model, I would like to understand whether running on the GPU is generally recognized as faster and more cost effective (cost of instance vs runtime)? Which type of instance on a cloud provider would be recommended for this model? We want to be able to scale if needed without too much hassle.

Thanks so much.
Mark

landemou · 2023-07-13T16:41:21Z

landemou
Jul 13, 2023

Hello, I have implemented faster-whisper with the large-v2 model in a professional environment. It is hosted on old hardware with 8GB RAM and GPU GTX1060 6GB VRAM, ubuntu os. This is an excellent price-performance ratio.

3 replies

Visio-Biswaroop Aug 6, 2023

Hi @landemou, can you tell me how many concurrent users it can support. thanks.

landemou Aug 6, 2023

I perform the treatments one at a time and not in parallel. The capacity with the large-v2 model is approximately 70 hours of audio per day. I use 2 computers to process 2 tasks in parallel and reduce processing time. Total capacity is 140 hours per day.

Visio-Biswaroop Aug 6, 2023

ok. thnx

silvacarl2 · 2023-08-07T01:12:38Z

silvacarl2
Aug 7, 2023

AWS g4dn.xlarge EC2

1 reply

brajeshvisio01 Oct 18, 2023

@silvacarl2 I have taken an instance with 2 GPU and set device_index=[0,1] and num_workers=2, then it should handle 4 req at a time and yes it handles but in best senario, but the problem is the overall time taken by the app is same when I run it on two instance of single gpu with gunicorn, the api is made using flask app. I have observed that the initially it takes time to give response of first request thats why the time has no defference. Please let me clarify . Thanks and regards

polaroi8d · 2024-04-03T14:10:12Z

polaroi8d
Apr 3, 2024

Are there any updates on this topic? I'm interested in hosting either faster-whisper or whisper.cpp. As I understand it, whisper.cpp could be more cost-effective because it can run quickly on inexpensive VMs. However, faster-whisper is faster when used with a high-end GPU-based VM.

0 replies

silvacarl2 · 2024-04-03T14:14:00Z

silvacarl2
Apr 3, 2024

Use an AWS g4dn.xlarge EC2 or AWS g5.xlarge EC2, both wokr great.

0 replies

toanhuynhnguyen · 2024-09-14T15:20:13Z

toanhuynhnguyen
Sep 14, 2024

Have you tried with AWS Inf1 Instance?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best place to host Whisper #327

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Best place to host Whisper #327

Replies: 5 comments · 4 replies

Replies: 5 comments 4 replies