-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audio-to-text pipeline #3078
Conversation
…quest to allow for various response types
Added error handlers to respond with "400 bad request" when duration cannot be calculated due to unsupported file format or file corruption. This prevents invalid jobs from being sent to the network. |
Overall LGTM. I would recommend using ffmpeg for audio track processing (like calculating durations) and only accepting audio input instead of videos -- ffmpeg can help with that as well and we could potentially commonize the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, other than that LGTM
…ation and logging, pin lpms
Rename audio to text
go.sum
Outdated
@@ -182,6 +182,8 @@ github.com/dop251/goja_nodejs v0.0.0-20210225215109-d91c329300e7/go.mod h1:hn7BA | |||
github.com/dop251/goja_nodejs v0.0.0-20211022123610-8dd9abb0616d/go.mod h1:DngW8aVqWbuLRMHItjPUyqdj+HWPvnQe8V8y1nDpIbM= | |||
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY= | |||
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto= | |||
github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12 h1:LCnOJCD97i2jmR7gY5gJieYLS+01XHnbWkCpmHC1t28= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note to replace when worker code is merged.
go.mod
Outdated
@@ -205,3 +206,5 @@ require ( | |||
lukechampine.com/blake3 v1.2.1 // indirect | |||
rsc.io/tmplfunc v0.0.3 // indirect | |||
) | |||
|
|||
replace github.com/livepeer/ai-worker => github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove when releasing.
github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b h1:VQcnrqtCA2UROp7q8ljkh2XA/u0KRgVv0S1xoUvOweE= | ||
github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b/go.mod h1:hwJ5DKhl+pTanFWl+EUpw1H7ukPO/H+MFpgA7jjshzw= | ||
github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded h1:ZQlvR5RB4nfT+cOQee+WqmaDOgGtP2oDMhcVvR4L0yA= | ||
github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded/go.mod h1:xkDdm+akniYxVT9KW1Y2Y7Hso6aW+rZObz3nrA9yTHw= | ||
github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18 h1:4oH3NqV0NvcdS44Ld3zK2tO8IUiNozIggm74yobQeZg= | ||
github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18/go.mod h1:Jpf4jHK+fbWioBHRDRM1WadNT1qmY27g2YicTdO0Rtc= | ||
github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69 h1:4A6geMb+HfxBBfaS24t8R3ddpEDfWbpx7NTQZMt5Fp4= | ||
github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69/go.mod h1:Hr/JhxxPDipOVd4ZrGYWrdJfpVF8/SEI0nNr2ctAlkM= | ||
github.com/livepeer/lpms v0.0.0-20240711175220-227325841434 h1:E7PKN6q/jMLapEV+eEwlwv87Xe5zacaVhvZ8T6AJR3c= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure versioning is correct.
This commit applies several code improvements to the AudioToText codebase.
@eliteprox looks like our pipeline fails to detect the duration of the following file: Do you maybe know why 🤔? |
…ents feat(ai): apply code improvements to AudioToText pipeline
This one appears to be a concatenated file and ffmpeg has issues calculating the duration in this case. The recommended solution is to re-encode the input to a consistent output format like flac. This can be combined with the effort to send audio-only to the ai-worker to optimize the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀.
This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline.
2e29849
to
0cf872b
Compare
* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types * Pin gomod to ai-runner for testing * Revert "Pin gomod to ai-runner for testing" This reverts commit d4ba500. * Update go mod dep for ai-worker * Calculate pixel value of audio file * fix go-mod deps * Adjust price calculation * one second per pixel * cleanup, fix missing duration * Add supported file types, calculate price by milliseconds * Add bad request response for unsupported file types * Update name of function * Update go mod to ai-runner * Use ffmpeg to get duration * update install_ffmpeg.sh to parse audio better * Check for audio codec instead of video codec * gomod edits * add docker file * Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms * rename speech-to-text to audio-to-text * Update go-mod * cleanup * update go mod * remove comment * update gomod * Update lpms mod * Update to latest lpms * Update lpms * feat(ai): apply code improvements to AudioToText pipeline This commit applies several code improvements to the AudioToText codebase. * Remove unnecessary logic * Remove unused error * Fix missing err * Update go.mod and tidy * chore(ai): update ai-worker and lpms to latest version This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline. --------- Co-authored-by: 0xb79orch <[email protected]> Co-authored-by: Rick Staa <[email protected]>
* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types * Pin gomod to ai-runner for testing * Revert "Pin gomod to ai-runner for testing" This reverts commit d4ba500. * Update go mod dep for ai-worker * Calculate pixel value of audio file * fix go-mod deps * Adjust price calculation * one second per pixel * cleanup, fix missing duration * Add supported file types, calculate price by milliseconds * Add bad request response for unsupported file types * Update name of function * Update go mod to ai-runner * Use ffmpeg to get duration * update install_ffmpeg.sh to parse audio better * Check for audio codec instead of video codec * gomod edits * add docker file * Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms * rename speech-to-text to audio-to-text * Update go-mod * cleanup * update go mod * remove comment * update gomod * Update lpms mod * Update to latest lpms * Update lpms * feat(ai): apply code improvements to AudioToText pipeline This commit applies several code improvements to the AudioToText codebase. * Remove unnecessary logic * Remove unused error * Fix missing err * Update go.mod and tidy * chore(ai): update ai-worker and lpms to latest version This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline. --------- Co-authored-by: 0xb79orch <[email protected]> Co-authored-by: Rick Staa <[email protected]>
What does this pull request do? Explain your changes. (required)
Adds the new
/audio-to-text
pipeline to go-livepeer, supporting theopenai/whisper-large-v3
model.File formats supported are mp3, m4a, mp4, webm, and flac
This change requires livepeer/ai-worker#103 and livepeer/lpms#407
Specific updates (required)
handleAIRequest
andprocessAIRequest
to support new response types like TextResponse/audio-to-text
endpoint toai_mediaserver.go
ffmpeg.GetCodecInfo
to calculate duration and requires the lpms pull request aboveHow did you test each of these updates (required)
curl request example:
Does this pull request close any open issues?
LIV-429
LIV-289
Checklist:
make
runs successfully./test.sh
pass