Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 2.4 KB

README.md

File metadata and controls

34 lines (26 loc) · 2.4 KB

RAVE Latent Space Exploration with Gestures

This is a pure Python project that allows users to navigate through the latent space of a pretrained RAVE model with gestures in real-time.

Video example

image

How it works

The gesture encoder is designed in such a way that its latent codes follow the prior of RAVE (4-dimensional Gaussian distribution). Each time, RAVE decodes the gesture embeddings. More information is provided in the training notebook.

Setup

  1. Clone the repository
  2. Install the required packages via pip install -r requirements.txt (tested with Python 3.10.12)
  3. Download a pretrained gesture encoder and unzip it in the root directory of the project
  4. Download the MediaPipe HandLandmarker model and place it in the models directory
  5. Move a pretrained RAVE model to the models directory (you can download some here or train your own custom model)

Usage

  1. Connect a webcam
  2. Run python generate.py --rave_model [PATH TO RAVE MODEL]

Optional arguments:

  • --gesture_encoder (the path to gesture encoder; change this to indicate your custom path or if you have trained a custom encoder)
  • --num_channels (the number of output audio channels; depends on a RAVE model; default=1)
  • --num_blocks (the number of streaming blocks; the smaller number corresponds to a smaller delay; default=4)
  • --temperature (variance multiplier for encoder; indicates the randomness of sampling; default=2.0; recommendations: works fine from 1 to 4)
  • --cam_device (the index of camera device; default=0)

Acknowledgements