CS230 Final Project: Text-to-sound Generation Using Diffusion Model

Our code is mainly developed based on TANGO (https://github.com/declare-lab/tango) and Diffsound (https://github.com/yangdongchao/Text-to-sound-Synthesis).

To download AudioCaps dataset, using (code is from https://github.com/MorenoLaQuatra/audiocaps-download)

python download_data.py

To train TANGO with FLAN-T5 on the AudioCaps dataset, using

cd tango-master
python train.py \
--text_encoder_name="google/flan-t5-large" \
--scheduler_name="stabilityai/stable-diffusion-2-1" \
--unet_model_config="configs/diffusion_model_config.json" \
--freeze_text_encoder --augment --snr_gamma 5 \

To train TANGO with CLIP on the AudioCaps dataset, using

cd tango-master
python train.py \
--text_encoder_name="stabilityai/stable-diffusion-2-1" \
--scheduler_name="stabilityai/stable-diffusion-2-1" \
--unet_model_config="configs/diffusion_model_config.json" \
--freeze_text_encoder --augment --snr_gamma 5 \

To run inference on the AudioCaps dataset with Diffsound, using

cd Diffsound
python evaluation/generate_samples_batch.py

To run inference on the AudioCaps dataset with TANGO, using

cd tango-master
python inference.py \
--original_args="saved/*/summary.jsonl" \
--model="saved/*/best/pytorch_model_2.bin" \

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Codebook		Codebook
Diffsound		Diffsound
fig		fig
tango-master		tango-master
download_data.py		download_data.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS230 Final Project: Text-to-sound Generation Using Diffusion Model

About

Releases

Packages

Languages

aijinrjinr/text-to-sound-generation

Folders and files

Latest commit

History

Repository files navigation

CS230 Final Project: Text-to-sound Generation Using Diffusion Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages