Easy Transcription is an easy-to-use interface to Deepgram's voice-to-text service, with Whisper model. The interface provides easy and high-accurcy access to transcribe audio files, with speaker diarisation enabled. You can also quickly mark different speakers and copy the finished transcript to anywhere you want.
-
Clone the project to your local achine:
git clone [email protected]:Techming/ez-transcription.git
-
Make sure that you have Python ^3.8 and pip installed. To check, run the following command:
$ python3 --version Python 3.8.5 # or anything above 3.8.0 $ pip3 --version pip 20.1.1 from /PATH/TO/YOUR/PYTHON/INTEPRETER/python3.8/site-packages/pip (python 3.8)
-
Go to Deepgram, register your account, create a new project and an API key.
-
Under project root, create a
.env
file and copy & paste your API key with the nameDEEPGRAM_API_KEY
echo DEEPGRAM_API_KEY=<PUT-YOUR-API-KEY-HERE> > .env
-
For Mac users, simply navigate to the project root and run
./setup.sh
-
For Windows users, follow the steps below:
-
First create a python virtual environment under the project room
python3 -m venv venv
-
Then activate the virtual environment
source venv/bin/activate
-
Finally, run pip install
pip install -r requirements.txt
-
-
Activate virtual environment under the project root
source venv/bin/activate
-
Run the command below
python main.py
A new browser window should pop up and that's it!
main.py
does accept several command line argument as following:
-
--reload
Reload the server when source files change -
--host TEXT
Host of the app, default is127.0.0.1
-
--port INTEGER
Port of the app, default is8000
-
--help
Show this message and exit.
The typical processing time for this API should be from 12x to 6x of the length of the audio file. For example, it can take from 20s to 40s for a 4 minute audio. Note that Whisper model in general takes more time to process the transcript. Other factors such as Deepgram traffic and quality of the audio can also contribute to the process time.
This could happen is there's many people takling or there's overlap between two speakers. Unfortuantely, this is a limitation from Deepgram. You are welcome to fine-tune Deepgram's parameters under main.py
If you have any feedback, please reach out by opening an issue.
Contributions are always welcome!