☆51Apr 13, 2025Updated 11 months ago
Alternatives and similar repositories for GenAU
Users that are interested in GenAU are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation☆16Aug 3, 2025Updated 7 months ago
- Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems☆13Jan 16, 2025Updated last year
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Explaining audio differences using language☆16Feb 11, 2025Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆33Feb 11, 2026Updated last month
- small audio language model for reasoning☆86Dec 4, 2025Updated 3 months ago
- Tools for the evaluation of audio captioning.☆19May 23, 2020Updated 5 years ago
- [CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".☆27Mar 1, 2024Updated 2 years ago
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- Official Code for "A Likelihood Ratio-Based Approach to Segmenting Unknown Objects" [IJCV 2025]☆15Jun 9, 2025Updated 9 months ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆190May 29, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Mar 4, 2025Updated last year
- Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation☆70Jan 22, 2026Updated 2 months ago
- ☆43Jan 13, 2025Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆47Sep 19, 2025Updated 6 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆130Sep 2, 2025Updated 6 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆153May 30, 2025Updated 9 months ago
- ☆22Mar 19, 2025Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆69Nov 1, 2024Updated last year
- CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding☆22Dec 17, 2025Updated 3 months ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆35Aug 30, 2025Updated 6 months ago
- Pytorch implementation of SoundCTM☆101Mar 31, 2025Updated 11 months ago
- [NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”☆36Jul 24, 2025Updated 8 months ago
- ☆11Dec 28, 2023Updated 2 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆198Dec 13, 2024Updated last year
- code for A Large-scale Dataset for Audio-Language Representation Learning☆14Sep 18, 2024Updated last year
- Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".☆93Dec 8, 2023Updated 2 years ago
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆193Mar 25, 2024Updated 2 years ago
- Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"☆41Jun 28, 2025Updated 8 months ago
- [ICCV 2025] ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models☆41Jul 2, 2025Updated 8 months ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆119May 19, 2025Updated 10 months ago
- ☆117Feb 26, 2026Updated 3 weeks ago
- Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.☆70Updated this week
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago