Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 1020 Bytes

README.md

File metadata and controls

20 lines (16 loc) · 1020 Bytes

SoundStorm: Efficient Parallel Audio Generation

Demo Page

Objective Evaluation

Prompt WER Speaker cosine Similarity UtteranceLevel Pitch Mean MAE UtteranceLevel Pitch Std MAE UtteranceLevel Duration Diff
Ground Truth 0.86 - - - -
2 Seconds 2.32 0.8670 20.1407 17.4387 -
4 Seconds 2.10 0.8817 21.1379 19.3733 -
6 Seconds 1.95 0.8905 17.2253 15.3792 -
8 Seconds 2.33 0.8895 18.5837 15.9667 -
4 Seconds(PrefixPrompt) 1.83 0.9351 12.0929 14.3814 1.5564 / 12.7153 (avg utter duration)