Skip to content

artificalaudio/artificalaudio.github.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Artifical Audio

Study Group Page for Audio Machine Learning Practicioners. (this is a quick page to get the ball rolling, and help organisation)

The purpose of this group is to learn and share understanding about modern practices in Audio Machine Learning.

Meetup - Dates/Times tbc. Starting June 2023, need to get an idea of location of participants in the study group, to get an idea for ideal time. This is being organised on LAION server under the audio section. Dependending on particpants preference, the idea is also to meet together in the voice channel of the server. (Please email or DM if you have alternate preference on where it's hosted; if people are happy with discord we can stick there)

Lesson One - Building an intuitive understanding of Encodec and RVQ, unconditional sound generation using transformers and custom drum dataset.

First one will also be introductions, soft requirement for microphone connection so you can interact, tell people what you’re skill level is from 1-5 (in coding and ML) and what you’re interested in getting out of the lessons, what you’re interested in learning. (I’ll write them all down, as that could very well help plan the lessons moving forward)

Challenge/Homework:

Encode your own dataset of sounds and train a transformer model to unconditionally generate sounds. Your choice of architecture/framework, Decoder only/Seq2Seq/etc HF/Torch/Mosiac etc.

Will go through an example in lesson that can be hacked around with and built upon/adapted for homework. Experienced coders are encouraged to get creative.

Points for interpretability experiments using different generation methods. For instance what effect does using only Beam search have on the output? Can you predict what might happen? Or how important are positional encodings for audio? Or what effect do the transformer activations have on the perceptual quality of the generated audio, eg Relu/Gelu, is there a difference?

Start of 2nd lesson we’ll go over homework submissions (play some audio of generated audio), and end of session go over homework issues/problems.

If anyone’s feeling confident and wants to go into the technical understanding/walkthrough implementation of RVQ we can do that in 2nd lesson. And start talking about shared embedding spaces with text/audio, trying to build an intuive understanding with shared embedding spaces.

I have starting making a list of ideal speakers/presenters. Please Email [email protected] if you would like to either suggest a speaker, or present some of your work, or host a lesson on your area of expertise.

Long view:

The study group aims to build an intuive understanding of Audio related Machine Learning and implement Music/Audio related ML papers. From Music generation, Text-To-Speech to Timbre Transfer etc. Exploring methodologies and toolsets required to generate audio.

The group will gain the ability to use Transformers with Audio and Symbolic(midi). We will make extensive use of open source technologies in an effort to minimise the negative environmental impact of training large models. We aim to stay up to date with the current research and make the most of the current pace of Machine Learning development.

Areas of scope:

Transformers for Audio, MelGAN/Spectrogram approaches, neural Vocoders, audio/sample Diffusion, GRU's for analog modelling, learned DSP approaches eg DDSP. (Feel free to email any more suggestions, Siren networks for audio? Neural Cellular Automata with CLAP guidance? etc)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages