Making music with Moûsai
◀ Prev | 2025-10-20, access: $$$ Pro | Next ▶
applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms.
