This solution is built on AWS Lambda serverless architecture, the solution integrates services such as Amazon Transcribe to help customers complete video caption related operations on a web interface, for example, automatically generating video captions, proofreading and editing video captions, translating captions, and burning captions into videos.
The solution includes the following functions:
- Generating captions: supports uploading MP4, MOV and MKV videos
- Choose Amazon Transcribe Custom vocabulary: before user upload videos, user can choose existing custom vocabulary, Amazon Transcribe will use this vocabulary to transcrip video to captions.
- Editing captions: supports proofreading, editing, paragraph splitting and merging, mapping, and replacement of captions.
- Translating captions: supports translation from the source language to the target language. Currently, this is only supported by the deployment in AWS Classic Regions.
- Downloading captions: supports SRT or WEBVTT.
- Burning captions: supports burning generated captions into videos, and then downloading the videos with captions. videos.
This solution deploys the Amazon CloudFormation template in your AWS Cloud account and completes the following settings.
- Amazon S3 (Web App) stores front-end static files
- Client side sends the request through Amazon API Gateway
- AWS Lambda Function receives the request from Amazon API Gateway, and process the business logic
- Client side gets the pre-signed URL from AWS Lambda, then uploads the video to be processed
- AWS Lambda Function gets/updates videos' profile from Amazon DynamoDB
- AWS Lambda Function invokes AWS Elemental MediaConvert to process video, which includes extracting audio from video and burning captions into video and stores the result into Amazon S3 (Videos)
- AWS Lambda Function invokes Amazon Transcribe captions from audio and store the result into Amazon S3 (Captions)
- AWS Lambda Function invokes Amazon Translate to translate captions to another language and stores the result into Amazon S3 (Captions)
Prebuilt CloudFormation templates and assets have been deployed to AWS regions with both Amazon Transcribe and Amazon Elemental MediaConvert. When launching the template, you will need to enter a stack name, an API key and choose a language as default language that Transcribe will use to process your video's audio data. You can still select language before upload the video to process
The API Key is used to provide to users access to the system. You must provide a strong, random, alpha-numeric API key between 20 and 70 characters long. Otherwise the stack will fail to launch and you will see "Invalid Key Error"
AWS Region Name | AWS Region Id | Deploy Solution |
---|---|---|
US East (N. Virginia) | us-east-1 | |
China (Beijing) | cn-north-1 | |
China (Ningxia) | cn-northwest-1 |
You are responsible for the cost of using Amazon Web Service's services used while running this solution. As of December 2022, the cost of the solution varies depending on:
- The number of requests to Amazon API Gateway
- The number of invoking AWS Lambda
- The number of read/write Amazon DynamoDB
- The number of videos that AWS Elemental MediaConvert processed. The solution uses AWS Elemental MediaConvert to extract audios from videos or burn captions into videos
- The number of audios that Amazon Transcribe processed. The solution uses Amazon Transcribe to extract text from audio and generate captions
- The number of captions characters that Amazon Translate processed. The solution uses Amazon Translate to translate the captions to another language.
Example: In US East (N. Virginia) Region (us-east-1), process 1 hour video, edit video captions for 500 times, tranlsate 10000 characters captions
The cost of using this solution to process this video is shown below:
Service | Dimensions | Cost |
---|---|---|
AWS Elemental MediaConvert | Extract 1 hour audio | $0.18 |
AWS Elemental MediaConvert | Burn captions into 1 hour video | $0.45 |
Amazon Transcribe | Extract text from 1 hour audio | $1.44 |
Amazon Translate | tranlsate 10000 characters | $0.15 |
Amazon API Gateway | 500 requests | $0.0017 |
AWS Lambda | 500 requests (avg 300ms, 128MB Memory) | $0.0001 |
Amazon DynamoDB | 1000 read/write | $0.00075 |
Total: $2.22 |
After the stack is successfully created, you can view the authentication information (APIKey) required to access the web interface and the created CloudFront URL (ConsoleUrl) on the Outputs tab of AWS CloudFormation stack.
-
Enter the CloudFront URL in the address bar of the browser.
-
Select Enter API Key, and enter the authentication information in the pop-up input box.
In the web interface, select Videos at the top of the page. The page displays four tabs, corresponding to the different status of videos:
- Videos being processed
- Videos ready for editing
- Videos marked as editing completed
- Videos with errors during processing
You can upload videos without captions for processing.
-
On the Videos page, select Upload videos....
-
Select the video, then select Open.
-
Select the video language, select custom vocabulary from select a vocabulary if you need (refer to Custom Vocabulary to learn more about custom vocabulary), and then choose Start. The system will start uploading the video and automatically process and generate captions. Firstly, the video is displayed on the Processing tab, and after the captions are generated, the video will be displayed on the Ready to edit tab.
You can proofread and edit the video captions.
-
On the Videos page, select the Ready to edit tab.
-
Select the language link from the Video Language column to enter the captions editing page. The functions include:
- Play the video content paragraph by paragraph to proofread the captions.
- Modify the captions.
- Merge or split captions parapraphs.
You can translate the video captions, and then proofread and edit translated captions. Note: Currently, this feature is only supported by the deployment in AWS Standard Regions.
-
On the captions editing page, select Translate to.
-
After selecting the target language in the drop-down list, the system will translate the captions into the target language. After the translation is completed, you can also proofread and edit the translated captions.
You can burn the generated captions into the video to create the video with captions.
-
On the Videos page, select the Ready to edit tab.
-
Select the language link from the Video Language column to enter the captions editing page.
After the captions or videos with captions are generated, you can download them directly:
-
If the video is in the editing status, select the video and the corresponding language to enter the video editing page to download.
-
If the video is in the completed status, you can directly download the captions or videos in the corresponding language of the video on the Completed tab of the Videos page.
If you wish to customize the solution, you can follow the steps below:
-
Clone the repo to your local machine
-
Go to the deployment directory
cd deployment
- Run the following commands to build the solution:
chmod +x ./build-s3-dist.sh
./build-s3-dist.sh <DIST_OUTPUT_BUCKET> <SOLUTION_NAME> <VERSION>
DIST_OUTPUT_BUCKET: the S3 bucket where the cloudformation template is located, you can use existing one or create a new bucket, e.g. your-bucket SOLUTION_NAME: the desired name of the solution, e.g. video-transcriber VERSION: e.g. v1.0.0
- Run the following commands to upload the built CloudFormation template to the corresponding S3 Bucket:
aws s3 cp deployment/global-s3-assets/ s3://<DIST_OUTPUT_BUCKET>/<SOLUTION_NAME>/<VERSION>/ --recursive
- Run the following commands to upload the built resources to the corresponding S3 Bucket, if you want to deploy the solution in us-east-1 region, the bucket should be <DIST_OUTPUT_BUCKET>-us-east-1.
aws s3 cp deployment/regional-s3-assets/ s3://<DIST_OUTPUT_BUCKET>-us-east-1/<SOLUTION_NAME>/<VERSION>/ --recursive
- Use the cloudformation template to deploy the solution. The template link should be: s3://<DIST_OUTPUT_BUCKET>/<SOLUTION_NAME>//video-transcriber-deploy.template
This library is licensed under the Apache 2.0 License.