North Coast Synthesis Ltd.

Making music with Moûsai

◀ Prev | 2025-10-20, access: $$$ Pro | Next ▶

The latent diffusion concept applied to music generation:  a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms.

Video applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms.