Add audio-to-text pipeline #3078

eliteprox · 2024-06-12T18:48:46Z

What does this pull request do? Explain your changes. (required)

Adds the new /audio-to-text pipeline to go-livepeer, supporting the openai/whisper-large-v3 model.

File formats supported are mp3, m4a, mp4, webm, and flac

This change requires livepeer/ai-worker#103 and livepeer/lpms#407

Specific updates (required)

Refactors handleAIRequest and processAIRequest to support new response types like TextResponse
Adds /audio-to-text endpoint to ai_mediaserver.go
Pricing fixed to one pixel per millisecond
Uses ffmpeg.GetCodecInfo to calculate duration and requires the lpms pull request above

How did you test each of these updates (required)

Tested with rich vocal audio up to 4 hours long. Regression tested other pipelines to ensure refactoring cause any issues.
Tested with all supported file formats and unsupported ones

curl request example:

curl --request POST   --url http://dev.eliteencoder.net:8937/audio-to-text --header 'Content-Type: multipart/form-data'   --form '[email protected]'   --form 'model_id=openai/whisper-large-v3'   --form seed=123

Does this pull request close any open issues?

LIV-429
LIV-289

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

…quest to allow for various response types

This reverts commit d4ba500.

eliteprox · 2024-07-01T14:14:44Z

Added error handlers to respond with "400 bad request" when duration cannot be calculated due to unsupported file format or file corruption. This prevents invalid jobs from being sent to the network.

common/util.go

server/ai_process.go

emranemran · 2024-07-02T20:04:41Z

Overall LGTM. I would recommend using ffmpeg for audio track processing (like calculating durations) and only accepting audio input instead of videos -- ffmpeg can help with that as well and we could potentially commonize the Probe package I linked in catalyst-api in the comments. Also, we need to understand what the file length limits are like.

leszko

Added some comments, other than that LGTM

common/util.go

server/ai_process.go

…ation and logging, pin lpms

Rename audio to text

rickstaa · 2024-07-14T11:29:03Z

go.sum

@@ -182,6 +182,8 @@ github.com/dop251/goja_nodejs v0.0.0-20210225215109-d91c329300e7/go.mod h1:hn7BA
 github.com/dop251/goja_nodejs v0.0.0-20211022123610-8dd9abb0616d/go.mod h1:DngW8aVqWbuLRMHItjPUyqdj+HWPvnQe8V8y1nDpIbM=
 github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
 github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
+github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12 h1:LCnOJCD97i2jmR7gY5gJieYLS+01XHnbWkCpmHC1t28=


Quick note to replace when worker code is merged.

rickstaa · 2024-07-14T11:29:08Z

go.mod

@@ -205,3 +206,5 @@ require (
 	lukechampine.com/blake3 v1.2.1 // indirect
 	rsc.io/tmplfunc v0.0.3 // indirect
 )
+
+replace github.com/livepeer/ai-worker => github.com/eliteprox/ai-worker v0.0.0-20240705062703-0908b518eb12


Remove when releasing.

rickstaa · 2024-07-14T11:29:21Z

go.sum

 github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b h1:VQcnrqtCA2UROp7q8ljkh2XA/u0KRgVv0S1xoUvOweE=
 github.com/livepeer/go-tools v0.3.6-0.20240130205227-92479de8531b/go.mod h1:hwJ5DKhl+pTanFWl+EUpw1H7ukPO/H+MFpgA7jjshzw=
 github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded h1:ZQlvR5RB4nfT+cOQee+WqmaDOgGtP2oDMhcVvR4L0yA=
 github.com/livepeer/joy4 v0.1.2-0.20191121080656-b2fea45cbded/go.mod h1:xkDdm+akniYxVT9KW1Y2Y7Hso6aW+rZObz3nrA9yTHw=
 github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18 h1:4oH3NqV0NvcdS44Ld3zK2tO8IUiNozIggm74yobQeZg=
 github.com/livepeer/livepeer-data v0.7.5-0.20231004073737-06f1f383fb18/go.mod h1:Jpf4jHK+fbWioBHRDRM1WadNT1qmY27g2YicTdO0Rtc=
-github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69 h1:4A6geMb+HfxBBfaS24t8R3ddpEDfWbpx7NTQZMt5Fp4=
-github.com/livepeer/lpms v0.0.0-20240120150405-de94555cdc69/go.mod h1:Hr/JhxxPDipOVd4ZrGYWrdJfpVF8/SEI0nNr2ctAlkM=
+github.com/livepeer/lpms v0.0.0-20240711175220-227325841434 h1:E7PKN6q/jMLapEV+eEwlwv87Xe5zacaVhvZ8T6AJR3c=


Ensure versioning is correct.

core/orchestrator.go

This commit applies several code improvements to the AudioToText codebase.

rickstaa · 2024-07-14T14:54:13Z

@eliteprox looks like our pipeline fails to detect the duration of the following file:

speech.zip

Do you maybe know why 🤔?

…ements

…ents feat(ai): apply code improvements to AudioToText pipeline

eliteprox · 2024-07-15T19:01:26Z

@eliteprox looks like our pipeline fails to detect the duration of the following file:

speech.zip

Do you maybe know why 🤔?

This one appears to be a concatenated file and ffmpeg has issues calculating the duration in this case. The recommended solution is to re-encode the input to a consistent output format like flac. This can be combined with the effort to send audio-only to the ai-worker to optimize the pipeline.

rickstaa

LGTM 🚀.

common/util.go

This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline.

* Add speech-to-text pipeline, refactor processAIRequest and handleAIRequest to allow for various response types * Pin gomod to ai-runner for testing * Revert "Pin gomod to ai-runner for testing" This reverts commit d4ba500. * Update go mod dep for ai-worker * Calculate pixel value of audio file * fix go-mod deps * Adjust price calculation * one second per pixel * cleanup, fix missing duration * Add supported file types, calculate price by milliseconds * Add bad request response for unsupported file types * Update name of function * Update go mod to ai-runner * Use ffmpeg to get duration * update install_ffmpeg.sh to parse audio better * Check for audio codec instead of video codec * gomod edits * add docker file * Update install_ffmpeg.sh to improve audio support, Add duration validation and logging, pin lpms * rename speech-to-text to audio-to-text * Update go-mod * cleanup * update go mod * remove comment * update gomod * Update lpms mod * Update to latest lpms * Update lpms * feat(ai): apply code improvements to AudioToText pipeline This commit applies several code improvements to the AudioToText codebase. * Remove unnecessary logic * Remove unused error * Fix missing err * Update go.mod and tidy * chore(ai): update ai-worker and lpms to latest version This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline. --------- Co-authored-by: 0xb79orch <[email protected]> Co-authored-by: Rick Staa <[email protected]>

Add speech-to-text pipeline, refactor processAIRequest and handleAIRe…

6fcc75d

…quest to allow for various response types

github-actions bot added the AI Issues and PR related to the AI-video branch. label Jun 12, 2024

eliteprox added 7 commits June 12, 2024 15:19

Pin gomod to ai-runner for testing

d4ba500

Revert "Pin gomod to ai-runner for testing"

b0fa4b7

This reverts commit d4ba500.

Update go mod dep for ai-worker

2144538

Calculate pixel value of audio file

8d424ff

fix go-mod deps

4d76749

Adjust price calculation

1104708

one second per pixel

4fcca57

eliteprox mentioned this pull request Jun 19, 2024

Add api docs for audio-to-text pipeline livepeer/docs#594

Merged

eliteprox marked this pull request as ready for review June 19, 2024 07:57

eliteprox requested a review from rickstaa as a code owner June 19, 2024 07:57

eliteprox added 4 commits June 19, 2024 05:06

cleanup, fix missing duration

0280296

Add supported file types, calculate price by milliseconds

34f9d2e

Add bad request response for unsupported file types

1579920

Update name of function

494654d

Update go mod to ai-runner

63c20e3

emranemran reviewed Jul 2, 2024

View reviewed changes

common/util.go Show resolved Hide resolved

server/ai_process.go Outdated Show resolved Hide resolved

emranemran requested review from victorges and leszko July 2, 2024 20:10

eliteprox and others added 5 commits July 3, 2024 04:11

Use ffmpeg to get duration

b78f11f

update install_ffmpeg.sh to parse audio better

3fea27b

Check for audio codec instead of video codec

c00b210

gomod edits

29309eb

add docker file

9920cca

leszko approved these changes Jul 4, 2024

View reviewed changes

common/util.go Outdated Show resolved Hide resolved

common/util.go Outdated Show resolved Hide resolved

server/ai_process.go Outdated Show resolved Hide resolved

eliteprox added 2 commits July 4, 2024 17:46

Update install_ffmpeg.sh to improve audio support, Add duration valid…

a185b5d

…ation and logging, pin lpms

rename speech-to-text to audio-to-text

7f18820

eliteprox added 3 commits July 5, 2024 02:22

Merge pull request #3 from eliteprox/rename-audio-to-text

b048341

Rename audio to text

update go mod

cad5b60

remove comment

fa62b5a

eliteprox changed the title ~~Add speech-to-text pipeline~~ Add audio-to-text pipeline Jul 5, 2024

eliteprox added 5 commits July 8, 2024 08:58

Merge branch 'ai-video' into add-speech-to-text

ba4f9dc

update gomod

bf03e0d

Update lpms mod

ff202d7

Update to latest lpms

8bad0ff

Update lpms

2c0bfb9

rickstaa reviewed Jul 14, 2024

View reviewed changes

core/orchestrator.go Outdated Show resolved Hide resolved

feat(ai): apply code improvements to AudioToText pipeline

cb4360b

This commit applies several code improvements to the AudioToText codebase.

eliteprox added 4 commits July 15, 2024 08:35

Remove unnecessary logic

5b24400

Merge branch 'add-speech-to-text' into add-speech-to-text-code-improv…

e307c70

…ements

Merge pull request #4 from eliteprox/add-speech-to-text-code-improvem…

d40d41b

…ents feat(ai): apply code improvements to AudioToText pipeline

Remove unused error

c131e59

rickstaa approved these changes Jul 15, 2024

View reviewed changes

common/util.go Outdated Show resolved Hide resolved

eliteprox and others added 3 commits July 15, 2024 16:00

Fix missing err

cc10b98

Update go.mod and tidy

6fabbf2

chore(ai): update ai-worker and lpms to latest version

0cf872b

This commit ensures that the ai-worker and lpms are at the latest versions which contain the changes needed for the audio-to-text pipeline.

rickstaa force-pushed the add-speech-to-text branch from 2e29849 to 0cf872b Compare July 17, 2024 13:29

rickstaa merged commit 2ab10c6 into livepeer:ai-video Jul 17, 2024
7 of 9 checks passed

rickstaa deleted the add-speech-to-text branch July 17, 2024 13:50

rickstaa mentioned this pull request Sep 23, 2024

selection algorithm transcoding conflict patch #3181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio-to-text pipeline #3078

Add audio-to-text pipeline #3078

eliteprox commented Jun 12, 2024 •

edited

Loading

eliteprox commented Jul 1, 2024

emranemran commented Jul 2, 2024

leszko left a comment

rickstaa Jul 14, 2024

rickstaa Jul 14, 2024

rickstaa Jul 14, 2024

rickstaa commented Jul 14, 2024

eliteprox commented Jul 15, 2024

rickstaa left a comment

Add audio-to-text pipeline #3078

Add audio-to-text pipeline #3078

Conversation

eliteprox commented Jun 12, 2024 • edited Loading

eliteprox commented Jul 1, 2024

emranemran commented Jul 2, 2024

leszko left a comment

Choose a reason for hiding this comment

rickstaa Jul 14, 2024

Choose a reason for hiding this comment

rickstaa Jul 14, 2024

Choose a reason for hiding this comment

rickstaa Jul 14, 2024

Choose a reason for hiding this comment

rickstaa commented Jul 14, 2024

eliteprox commented Jul 15, 2024

rickstaa left a comment

Choose a reason for hiding this comment

eliteprox commented Jun 12, 2024 •

edited

Loading