☆50Apr 13, 2025Updated 10 months ago
Alternatives and similar repositories for GenAU
Users that are interested in GenAU are comparing it to the libraries listed below
Sorting:
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆33Feb 11, 2026Updated 3 weeks ago
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Explaining audio differences using language☆16Feb 11, 2025Updated last year
- Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems☆13Jan 16, 2025Updated last year
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation☆16Aug 3, 2025Updated 7 months ago
- [CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".☆27Mar 1, 2024Updated 2 years ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 3 months ago
- Tools for the evaluation of audio captioning.☆18May 23, 2020Updated 5 years ago
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆124Sep 2, 2025Updated 6 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- ☆43Jan 13, 2025Updated last year
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Mar 4, 2025Updated last year
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆119May 19, 2025Updated 9 months ago
- ☆22Mar 19, 2025Updated 11 months ago
- CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding☆22Dec 17, 2025Updated 2 months ago
- ☆37Jul 4, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆68Nov 1, 2024Updated last year
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated 11 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation☆69Jan 22, 2026Updated last month
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆188May 29, 2024Updated last year
- Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"☆41Jun 28, 2025Updated 8 months ago
- HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz☆24Jan 2, 2024Updated 2 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆196Dec 13, 2024Updated last year
- ☆10Oct 16, 2025Updated 4 months ago
- ☆117Updated this week
- ☆11Dec 28, 2023Updated 2 years ago
- ☆11Sep 1, 2024Updated last year
- [NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”☆35Jul 24, 2025Updated 7 months ago
- ☆33Dec 23, 2025Updated 2 months ago
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆56Jun 1, 2025Updated 9 months ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆28Sep 20, 2025Updated 5 months ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- llmon-py is a multimodal webui for Llama 3-8B.☆16Jul 1, 2024Updated last year