Skip to content

Our attempt to use AudioCLIP to translate any modality into sound or images. Inspired by deep-daze, big-sleep, latent2visions etc

Notifications You must be signed in to change notification settings

pollinations/CLIPTranslate

Repository files navigation

CLIPTranslate

Given a latent CLIP vector from a text or image input, we want to synthesize a sound that fits semantically. As a start, we optimize a SIREN audio to give us a good CLIP score.

Setup

# Install dependencies
pip install git+git://github.com/pollinations/CLIPTranslate
# or
pip install -e .

Development in colab

To develop in colab, I recommend to mount google drive in the notebook and on your laptop and import the project from drive. Copy this colab notebook as a start.

About

Our attempt to use AudioCLIP to translate any modality into sound or images. Inspired by deep-daze, big-sleep, latent2visions etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages