Feat: Docker environment for remote speech to text evaluation (#110)

* Add Dockerfile * added dependecies for docker * dummy docker bug fix * dummy docker bug fix * Change dockerfile to do remote evaluation * finalize dockerfile * debug segments.py * test docker locally, works * download nltk for pytest * add docker documentation for speech to text model * add | to formatting * formatting change * take out nltk
facebookresearch · Aug 14, 2024 · 6101ba1 · 6101ba1
1 parent 439c937
commit 6101ba1
Show file tree

Hide file tree

Showing 3 changed files with 46 additions and 0 deletions.
diff --git a/docs/tutorials/remote_evaluation.rst b/docs/tutorials/remote_evaluation.rst
@@ -14,6 +14,15 @@ For instance, with the agent in :ref:`first-agent`,
     2022-12-06 19:12:26 | INFO | simuleval.cli | Evaluate system: DummyWaitkTextAgent
     2022-12-06 19:12:26 | INFO | simuleval.agent_server | Simultaneous Translation Server Started (process id 53902). Listening to port 8888
 
+|
+For custom speech to text transcription, you could also use the whisper agent in :ref:`speech-to-text`, 
+
+.. code-block:: bash
+
+    > simuleval --standalone --remote-port 8888 --agent whisper_waitk.py --waitk-lagging 3
+    2024-08-11 11:51:56 | INFO | simuleval.utils.agent | System will run on device: cpu. dtype: fp32
+    2024-08-11 11:51:56 | INFO | simuleval.agent_server | Simultaneous Translation Server Started (process id 38768). Listening to port 8888
+
 For detailed RESTful APIs, please see (TODO)
 
 Docker
@@ -31,6 +40,19 @@ Build and run the docker image:
     cd examples/quick_start && docker build -t simuleval_agent .
     docker run -p 8888:8888 simuleval_agent:latest
 
+|
+The custom audio file speech to text :code:`Dockerfile` is
+
+.. literalinclude:: ../../examples/speech_to_text/Dockerfile
+    :language: docker
+
+Build and run the docker image:
+
+.. code-block:: bash
+
+    cd examples/speech_to_text && docker build -t simuleval-speech-to-text:1.0 .
+    docker run -p 8888:8888 simuleval-speech-to-text:1.0
+
 Remote Evaluation
 ------------------
 If there is an agent server or docker image available,
@@ -42,3 +64,14 @@ We can start a remote evaluator as follow. For simplicity we assume they are on
     simuleval --remote-eval --remote-port 8888 \
         --source source.txt --target target.txt \
         --source-type text --target-type text
+
+|
+For whisper agent's speech to text:
+
+.. code-block:: bash
+
+    simuleval --remote-eval --remote-port 8888 \
+        --source-segment-size 500 \
+        --source source.txt --target reference/transcript.txt \
+        --source-type speech --target-type text \
+        --output output --quality-metrics WER
diff --git a/docs/tutorials/speech_to_text.rst b/docs/tutorials/speech_to_text.rst
@@ -1,3 +1,5 @@
+.. _speech-to-text:
+
 Speech-to-Text
 ==============
 

diff --git a/examples/speech_to_text/Dockerfile b/examples/speech_to_text/Dockerfile
@@ -0,0 +1,11 @@
+FROM python:3.8
+RUN apt-get update \
+    && apt-get upgrade -y \
+    && apt-get -y install apt-utils gcc libpq-dev libsndfile-dev
+RUN pip install -U openai-whisper
+RUN pip install -U editdistance
+RUN git clone https://github.com/facebookresearch/SimulEval
+WORKDIR /SimulEval/
+RUN pip install -e .
+WORKDIR /SimulEval/examples/speech_to_text/
+CMD ["simuleval", "--standalone", "--remote-port", "8888", "--agent", "whisper_waitk.py",  "--waitk-lagging", "3"]