snap-research / GenAULinks
☆40Updated 5 months ago
Alternatives and similar repositories for GenAU
Users that are interested in GenAU are comparing it to the libraries listed below
Sorting:
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆71Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆39Updated last year
- ☆37Updated 5 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆26Updated last year
- small audio language model for reasoning☆74Updated 4 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)☆28Updated 8 months ago
- ☆42Updated 8 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆107Updated 3 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆55Updated 10 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Updated last year
- ☆25Updated last year
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆29Updated 6 months ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆78Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆101Updated 8 months ago
- A low-bitrate single-codebook 16 kHz speech codec based on focal modulation☆95Updated 7 months ago
- Codebase and project page for EDMSound☆34Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Updated 11 months ago
- ☆22Updated this week
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆75Updated 10 months ago
- ☆37Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆85Updated last year
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆80Updated 3 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆61Updated 10 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆54Updated last year
- ☆60Updated 10 months ago
- Source code for DM-Codec.☆49Updated 3 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆49Updated 4 months ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆38Updated 10 months ago
- The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"☆34Updated last year
- The open-source code of UniAudio2.0☆64Updated last week