text-to-video-model/README.md at main · camenduru/text-to-video-model · GitHub

🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
🥳 Please join my patreon community https://patreon.com/camenduru

Potat 1️⃣ (Prototype Model)

Open-Source 1024x576 Text To Video Model 🥳
Trained with https://lambdalabs.com ❤ 1xA100 (40GB)
2197 clips, 68388 tagged frames ( salesforce/blip2-opt-6.7b-coco )
train_steps: 10000
System RAM: ~8.5 GB VRAM: ~11 GB Model Size: ~4.1G

🦒 Colab

Colab	Info
	test

📦 Model

https://huggingface.co/camenduru/potat1
https://huggingface.co/vdo/potat1-5000/tree/main
https://huggingface.co/vdo/potat1-10000/tree/main
https://huggingface.co/vdo/potat1-10000-base-text-encoder/tree/main
https://huggingface.co/vdo/potat1-15000/tree/main
https://huggingface.co/vdo/potat1-20000/tree/main
https://huggingface.co/vdo/potat1-25000/tree/main
https://huggingface.co/vdo/potat1-30000/tree/main
https://huggingface.co/vdo/potat1-35000/tree/main
https://huggingface.co/vdo/potat1-40000/tree/main
https://huggingface.co/vdo/potat1-45000/tree/main
https://huggingface.co/vdo/potat1-50000/tree/main
https://huggingface.co/vdo/potat1-50000-base-text-encoder/tree/main = https://huggingface.co/camenduru/potat1

🧪 Examles

Prompt: Octopus under the ocean.

download_31.mp4

244223223-c5201c8a-2815-4533-9474-1e312c564f4e.mp4

📋 Tutorial

https://github.com/camenduru/Text-To-Video-Finetuning-colab

📦 Base Model

https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis
https://www.modelscope.cn/models/damo/text-to-video-synthesis

📦 Dataset & Config

https://huggingface.co/camenduru/potat1_dataset/tree/main
https://github.com/microsoft/XPretrain/tree/main/hd-vila-100m (HD-VILA-100M Dataset)
http://toflow.csail.mit.edu/ (Vimeo-90k Dataset)
https://github.com/m-bain/webvid
https://github.com/ExponentialML/Video-BLIP2-Preprocessor
https://github.com/Breakthrough/PySceneDetect

🍱 Finetuning

https://github.com/guoyww/animatediff
https://github.com/showlab/Tune-A-Video
https://github.com/ExponentialML/Text-To-Video-Finetuning
https://www.modelscope.cn/models/damo/text-to-video-synthesis

Thanks to damo-vilab ❤ ExponentialML ❤ kabachuha ❤ @DiffusersLib ❤ @LambdaAPI ❤ @cerspense ❤ @CiaraRowles1 ❤ @p1atdev_art ❤

Thanks to Orellius ❤ (important bug report)