A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Nov 10, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
End-to-End Speech Processing Toolkit
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
The dataset of Speech Recognition
Tracking the progress in end-to-end speech translation
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Zero -- A neural machine translation system
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Repository containing the open source code of works published at the FBK MT unit.
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
Add a description, image, and links to the speech-translation topic page so that developers can more easily learn about it.
To associate your repository with the speech-translation topic, visit your repo's landing page and select "manage topics."