RAVE Latent Space Exploration with Gestures

This is a pure Python project that allows users to navigate through the latent space of a pretrained RAVE model with gestures in real-time.

Video example

How it works

The gesture encoder is designed in such a way that its latent codes follow the prior of RAVE (4-dimensional Gaussian distribution). Each time, RAVE decodes the gesture embeddings. More information is provided in the training notebook.

Setup

Clone the repository
Install the required packages via pip install -r requirements.txt (tested with Python 3.10.12)
Download a pretrained gesture encoder and unzip it in the root directory of the project
Download the MediaPipe HandLandmarker model and place it in the models directory
Move a pretrained RAVE model to the models directory (you can download some here or train your own custom model)

Usage

Connect a webcam
Run python generate.py --rave_model [PATH TO RAVE MODEL]

Optional arguments:

--gesture_encoder (the path to gesture encoder; change this to indicate your custom path or if you have trained a custom encoder)
--num_channels (the number of output audio channels; depends on a RAVE model; default=1)
--num_blocks (the number of streaming blocks; the smaller number corresponds to a smaller delay; default=4)
--temperature (variance multiplier for encoder; indicates the randomness of sampling; default=2.0; recommendations: works fine from 1 to 4)
--cam_device (the index of camera device; default=0)

Acknowledgements

Antoine Caillon and IRCAM for RAVE
Google for MediaPipe solutions
Matthias Geier and other contributors for sounddevice
Kapitanov et al. for HaGRID dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RAVE Latent Space Exploration with Gestures

How it works

Setup

Usage

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

RAVE Latent Space Exploration with Gestures

How it works

Setup

Usage

Acknowledgements