Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audio-to-text pipeline #3078

Merged
merged 39 commits into from
Jul 17, 2024
Merged

Conversation

eliteprox
Copy link
Collaborator

@eliteprox eliteprox commented Jun 12, 2024

What does this pull request do? Explain your changes. (required)

Adds the new /audio-to-text pipeline to go-livepeer, supporting the openai/whisper-large-v3 model.

File formats supported are mp3, m4a, mp4, webm, and flac

This change requires livepeer/ai-worker#103 and livepeer/lpms#407

Specific updates (required)

  • Refactors handleAIRequest and processAIRequest to support new response types like TextResponse
  • Adds /audio-to-text endpoint to ai_mediaserver.go
  • Pricing fixed to one pixel per millisecond
  • Uses ffmpeg.GetCodecInfo to calculate duration and requires the lpms pull request above

How did you test each of these updates (required)

  • Tested with rich vocal audio up to 4 hours long. Regression tested other pipelines to ensure refactoring cause any issues.
  • Tested with all supported file formats and unsupported ones

curl request example:

curl --request POST   --url http://dev.eliteencoder.net:8937/audio-to-text --header 'Content-Type: multipart/form-data'   --form '[email protected]'   --form 'model_id=openai/whisper-large-v3'   --form seed=123

Does this pull request close any open issues?

LIV-429
LIV-289

Checklist:

@github-actions github-actions bot added the AI Issues and PR related to the AI-video branch. label Jun 12, 2024
@eliteprox eliteprox marked this pull request as ready for review June 19, 2024 07:57
@eliteprox eliteprox requested a review from rickstaa as a code owner June 19, 2024 07:57
@eliteprox
Copy link
Collaborator Author

Added error handlers to respond with "400 bad request" when duration cannot be calculated due to unsupported file format or file corruption. This prevents invalid jobs from being sent to the network.

common/util.go Show resolved Hide resolved
server/ai_process.go Outdated Show resolved Hide resolved
@emranemran
Copy link
Contributor

Overall LGTM. I would recommend using ffmpeg for audio track processing (like calculating durations) and only accepting audio input instead of videos -- ffmpeg can help with that as well and we could potentially commonize the Probe package I linked in catalyst-api in the comments. Also, we need to understand what the file length limits are like.

Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, other than that LGTM

common/util.go Outdated Show resolved Hide resolved
common/util.go Outdated Show resolved Hide resolved
server/ai_process.go Outdated Show resolved Hide resolved
@eliteprox eliteprox changed the title Add speech-to-text pipeline Add audio-to-text pipeline Jul 5, 2024
go.sum Outdated
@@ -182,6 +182,8 @@ github.com/dop251/goja_nodejs v0.0.0-20210225215109-d91c329300e7/go.mod h1:hn7BA
github.com/dop251/goja_nodejs v0.0.0-20211022123610-8dd9abb0616d/go.mod h1:DngW8aVqWbuLRMHItjPUyqdj+HWPvnQe8V8y1nDpIbM=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12 h1:LCnOJCD97i2jmR7gY5gJieYLS+01XHnbWkCpmHC1t28=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note to replace when worker code is merged.

go.mod Outdated
@@ -205,3 +206,5 @@ require (
lukechampine.com/blake3 v1.2.1 // indirect
rsc.io/tmplfunc v0.0.3 // indirect
)

replace github.com/livepeer/ai-worker => github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove when releasing.

github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b h1:VQcnrqtCA2UROp7q8ljkh2XA/u0KRgVv0S1xoUvOweE=
github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b/go.mod h1:hwJ5DKhl+pTanFWl+EUpw1H7ukPO/H+MFpgA7jjshzw=
github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded h1:ZQlvR5RB4nfT+cOQee+WqmaDOgGtP2oDMhcVvR4L0yA=
github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded/go.mod h1:xkDdm+akniYxVT9KW1Y2Y7Hso6aW+rZObz3nrA9yTHw=
github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18 h1:4oH3NqV0NvcdS44Ld3zK2tO8IUiNozIggm74yobQeZg=
github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18/go.mod h1:Jpf4jHK+fbWioBHRDRM1WadNT1qmY27g2YicTdO0Rtc=
github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69 h1:4A6geMb+HfxBBfaS24t8R3ddpEDfWbpx7NTQZMt5Fp4=
github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69/go.mod h1:Hr/JhxxPDipOVd4ZrGYWrdJfpVF8/SEI0nNr2ctAlkM=
github.com/livepeer/lpms v0.0.0-20240711175220-227325841434 h1:E7PKN6q/jMLapEV+eEwlwv87Xe5zacaVhvZ8T6AJR3c=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure versioning is correct.

core/orchestrator.go Outdated Show resolved Hide resolved
This commit applies several code improvements to the AudioToText
codebase.
@rickstaa
Copy link
Contributor

@eliteprox looks like our pipeline fails to detect the duration of the following file:

speech.zip

Do you maybe know why 🤔?

@eliteprox
Copy link
Collaborator Author

@eliteprox looks like our pipeline fails to detect the duration of the following file:

speech.zip

Do you maybe know why 🤔?

This one appears to be a concatenated file and ffmpeg has issues calculating the duration in this case. The recommended solution is to re-encode the input to a consistent output format like flac. This can be combined with the effort to send audio-only to the ai-worker to optimize the pipeline.

Copy link
Contributor

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀.

common/util.go Outdated Show resolved Hide resolved
eliteprox and others added 3 commits July 15, 2024 16:00
This commit ensures that the ai-worker and lpms are at the latest
versions which contain the changes needed for the audio-to-text
pipeline.
@rickstaa rickstaa merged commit 2ab10c6 into livepeer:ai-video Jul 17, 2024
7 of 9 checks passed
@rickstaa rickstaa deleted the add-speech-to-text branch July 17, 2024 13:50
eliteprox added a commit to eliteprox/go-livepeer that referenced this pull request Jul 26, 2024
* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types

* Pin gomod to ai-runner for testing

* Revert "Pin gomod to ai-runner for testing"

This reverts commit d4ba500.

* Update go mod dep for ai-worker

* Calculate pixel value of audio file

* fix go-mod deps

* Adjust price calculation

* one second per pixel

* cleanup, fix missing duration

* Add supported file types, calculate price by milliseconds

* Add bad request response for unsupported file types

* Update name of function

* Update go mod to ai-runner

* Use ffmpeg to get duration

* update install_ffmpeg.sh to parse audio better

* Check for audio codec instead of video codec

* gomod edits

* add docker file

* Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms

* rename speech-to-text to audio-to-text

* Update go-mod

* cleanup

* update go mod

* remove comment

* update gomod

* Update lpms mod

* Update to latest lpms

* Update lpms

* feat(ai): apply code improvements to AudioToText pipeline

This commit applies several code improvements to the AudioToText
codebase.

* Remove unnecessary logic

* Remove unused error

* Fix missing err

* Update go.mod and tidy

* chore(ai): update ai-worker and lpms to latest version

This commit ensures that the ai-worker and lpms are at the latest
versions which contain the changes needed for the audio-to-text
pipeline.

---------

Co-authored-by: 0xb79orch <[email protected]>
Co-authored-by: Rick Staa <[email protected]>
eliteprox added a commit to eliteprox/go-livepeer that referenced this pull request Jul 26, 2024
* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types

* Pin gomod to ai-runner for testing

* Revert "Pin gomod to ai-runner for testing"

This reverts commit d4ba500.

* Update go mod dep for ai-worker

* Calculate pixel value of audio file

* fix go-mod deps

* Adjust price calculation

* one second per pixel

* cleanup, fix missing duration

* Add supported file types, calculate price by milliseconds

* Add bad request response for unsupported file types

* Update name of function

* Update go mod to ai-runner

* Use ffmpeg to get duration

* update install_ffmpeg.sh to parse audio better

* Check for audio codec instead of video codec

* gomod edits

* add docker file

* Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms

* rename speech-to-text to audio-to-text

* Update go-mod

* cleanup

* update go mod

* remove comment

* update gomod

* Update lpms mod

* Update to latest lpms

* Update lpms

* feat(ai): apply code improvements to AudioToText pipeline

This commit applies several code improvements to the AudioToText
codebase.

* Remove unnecessary logic

* Remove unused error

* Fix missing err

* Update go.mod and tidy

* chore(ai): update ai-worker and lpms to latest version

This commit ensures that the ai-worker and lpms are at the latest
versions which contain the changes needed for the audio-to-text
pipeline.

---------

Co-authored-by: 0xb79orch <[email protected]>
Co-authored-by: Rick Staa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants