-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into philnash-add-model-selection-to-azure-openai…
…-embeddings
- Loading branch information
Showing
17 changed files
with
487 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
--- | ||
title: AssemblyAI | ||
sidebar_position: 3 | ||
slug: /integrations-assemblyai | ||
--- | ||
|
||
|
||
|
||
# AssemblyAI | ||
|
||
The AssemblyAI components allow you to apply powerful Speech AI models to your app for tasks like: | ||
|
||
- Transcribing audio and video files | ||
- Formatting transcripts | ||
- Generating subtitles | ||
- Applying LLMs to audio files | ||
|
||
More info about AssemblyAI: | ||
|
||
- [Website](https://www.assemblyai.com/) | ||
- [AssemblyAI API Docs](https://www.assemblyai.com/docs) | ||
- [Get a Free API key](https://www.assemblyai.com/dashboard/signup) | ||
|
||
|
||
## Prerequisites | ||
|
||
You need an **AssemblyAI API key**. After creating a free account, you'll find the API key in your dashboard. [Get a Free API key here](https://www.assemblyai.com/dashboard/signup). | ||
|
||
Enter the key in the *AssemblyAI API Key* field in all components that require the key. | ||
|
||
(Optional): To use LeMUR, you need to upgrade your AssemblyAI account, since this is not included in the free account. | ||
|
||
## Components | ||
|
||
![AssemblyAI Components](./assemblyai-components.png) | ||
|
||
### AssemblyAI Start Transcript | ||
|
||
This component allows you to submit an audio or video file for transcription. | ||
|
||
**Tip**: You can freeze the path of this component to only submit the file once. | ||
|
||
- **Input**: | ||
- AssemblyAI API Key: Your API key. | ||
- Audio File: The audio or video file to transcribe. | ||
- Speech Model (Optional): Select the class of models. Default is *Best*. See [speech models](https://www.assemblyai.com/docs/speech-to-text/speech-recognition#select-the-speech-model-with-best-and-nano) for more info. | ||
- Automatic Language Detection (Optional): Enable automatic language detection. | ||
- Language (Optional): The language of the audio file. Can be set manually if automatic language detection is disabled. | ||
See [supported languages](https://www.assemblyai.com/docs/getting-started/supported-languages) for a list of supported language codes. | ||
- Enable Speaker Labels (Optional): Detect speakers in an audio file and what each speaker said. | ||
- Expected Number of Speakers (Optional): Set the expected number of speakers, if Speaker Labels is enabled. | ||
- Audio File URL (Optional): The URL of the audio or video file to transcribe. Can be used instead of *Audio File*. | ||
- Punctuate (Optional): Apply punctuation. Default is true. | ||
- Format Text (Optional): Apply casing and text formatting. Default is true. | ||
|
||
- **Output**: | ||
- Transcript ID: The id of the transcript | ||
|
||
|
||
### AssebmlyAI Poll Transcript | ||
|
||
This components allows you to poll the transcripts. It checks the status of the transcript every few seconds until the transcription is completed. | ||
|
||
- **Input**: | ||
- AssemblyAI API Key: Your API key. | ||
- Polling Interval: The polling interval in seconds. Default is 3. | ||
|
||
- **Output**: | ||
- Transcription Result: The AssemblyAI JSON response of a completed transcript. Contains the text and other info. | ||
|
||
|
||
### AssebmlyAI Parse Transcript | ||
|
||
This component allows you to parse a *Transcription Result* and outputs the formatted text. | ||
|
||
- **Input**: | ||
- Transcription Result: The output of the *Poll Transcript* component. | ||
|
||
- **Output**: | ||
- Parsed transcription: The parsed transcript. If Speaker Labels was enabled in the *Start Transcript* component, it formats utterances with speakers and timestamps. | ||
|
||
### AssebmlyAI Get Subtitles | ||
|
||
This component allows you to generate subtitles in SRT or VTT format. | ||
|
||
- **Input**: | ||
- AssemblyAI API Key: Your API key. | ||
- Transcription Result: The output of the *Poll Transcript* component. | ||
- Subtitle Format: The format of the captions (SRT or VTT). | ||
- Character per Caption (Optional): The maximum number of characters per caption (0 for no limit). | ||
|
||
- **Output**: | ||
- Parsed transcription: The parsed transcript. If Speaker Labels was enabled in the *Start Transcript* component, it formats utterances with speakers and timestamps. | ||
|
||
|
||
### AssebmlyAI LeMUR | ||
|
||
This component allows you to apply Large Language Models to spoken data using the [AssemblyAI LeMUR framework](https://www.assemblyai.com/docs/lemur). | ||
|
||
LeMUR automatically ingests the transcript as additional context, making it easy to apply LLMs to audio data. You can use it for tasks like summarizing audio, extracting insights, or asking questions. | ||
|
||
- **Input**: | ||
- AssemblyAI API Key: Your API key. | ||
- Transcription Result: The output of the *Poll Transcript* component. | ||
- Input Prompt: The text to prompt the model. You can type your prompt in this field or connect it to a *Prompt* component. | ||
- Final Model: The model that is used for the final prompt after compression is performed. Default is Claude 3.5 Sonnet. | ||
- Temperature (Optional): The temperature to use for the model. Default is 0.0. | ||
- Max Output Size (Optional): Max output size in tokens, up to 4000. Default is 2000. | ||
|
||
- **Output**: | ||
- LeMUR Response: The generated LLM response. | ||
|
||
### AssemblyAI List Transcripts | ||
|
||
This component can be used as a standalone component to list all previously generated transcripts. | ||
|
||
- **Input**: | ||
- AssemblyAI API Key: Your API key. | ||
- Limit (Optional): Maximum number of transcripts to retrieve. Default is 20, use 0 for all. | ||
- Filter (Optional): Filter by transcript status. | ||
- Created On (Optional): Only get transcripts created on this date (YYYY-MM-DD). | ||
- Throttled Only (Optional): Only get throttled transcripts, overrides the status filter | ||
|
||
- **Output**: | ||
- Transcript List: A list of all transcripts with info such as the transcript ID, the status, and the data. | ||
|
||
|
||
## Flow Process | ||
|
||
1. The user inputs an audio or video file. | ||
2. The user can also input an LLM prompt. In this example, we want to generate a summary of the transcript. | ||
3. The flow submits the audio file for transcription. | ||
4. The flow checks the status of the transcript every few seconds until transcription is completed. | ||
5. The flow parses the transcript and outputs the formatted text. | ||
6. The flow also generates subtitles. | ||
7. The flow applies the LLM prompt to generate a summary. | ||
8. As a standalone component, all transcripts can be listed. | ||
|
||
## Run the Transcription and Speech AI Flow | ||
|
||
To run the Transcription and Speech AI Flow: | ||
|
||
1. Open Langflow and create a new project. | ||
2. Add the components listed above to your flow canvas, or download the [AssemblyAI Transcription and Speech AI Flow](./AssemblyAI_Flow.json)(Download link) and **Import** the JSON file into Langflow. | ||
3. Connect the components as shown in the flow diagram. **Tip**: Freeze the path of the *Start Transcript* component to only submit the file once. | ||
4. Input the AssemblyAI API key in in all components that require the key (Start Transcript, Poll Transcript, Get Subtitles, LeMUR, List Transcripts). | ||
5. Select an audio or video file in the *Start Transcript* component. | ||
6. Run the flow by clicking **Play** on the *Parse Transcript* component. | ||
7. To generate subtitles, click **Play** on the *Get Subtitles* component. | ||
8. To apply an LLM to your audio file, click **Play** on the *LeMUR* component. Note that you need an upgraded AssemblyAI account to use LeMUR. | ||
9. To list all transcripts, click **Play** on the *List Transcript* component. | ||
|
||
|
||
## Customization | ||
|
||
The flow can be customized by: | ||
|
||
1. Modifying the parameters in the *Start Transcript* component. | ||
2. Modifying the subtitle format in the *Get Subtitles* component. | ||
3. Modifying the LLM prompt for input of the *LeMUR* component. | ||
4. Modifying the LLM parameters (e.g., temperature) in the *LeMUR* component. | ||
|
||
## Troubleshooting | ||
|
||
If you encounter issues: | ||
|
||
1. Ensure the API key is correctly set in all components that require the key. | ||
2. To use LeMUR, you need to upgrade your AssemblyAI account, since this is not included in the free account. | ||
3. Verify that all components are properly connected in the flow. | ||
4. Review the Langflow logs for any error messages. | ||
|
||
For more advanced usage, refer to the [AssemblyAI API documentation](https://www.assemblyai.com/docs/). If you need more help, you can reach out to the [AssemblyAI support](https://www.assemblyai.com/contact/support). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.