Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
Alternatives and similar repositories for AudioEntailment
Users that are interested in AudioEntailment are comparing it to the libraries listed below
Sorting:
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- ☆11Sep 25, 2024Updated last year
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 2 months ago
- AnyAccomp: Generalizable accompaniment generation for vocals and solo instruments, powered by a quantized melodic bottleneck.☆33Dec 22, 2025Updated 2 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆34Sep 9, 2025Updated 5 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Jun 13, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated 3 weeks ago
- [ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?☆43Nov 21, 2025Updated 3 months ago
- FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation☆29Dec 19, 2024Updated last year
- Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆64Feb 7, 2026Updated 2 weeks ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆107Dec 20, 2025Updated 2 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Codes and MIDI demos of ISMIR 2022 paper: Domain Adversarial Training on Conditional Variational Auto-Encoder for Controllable Music Gene…☆21Mar 28, 2023Updated 2 years ago
- Llasa Speed Up☆60Jan 18, 2026Updated last month
- Notebooks for the EPFL class "Computers and Music".☆25Aug 20, 2021Updated 4 years ago
- Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.☆68Jul 19, 2025Updated 7 months ago
- [AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models☆27Dec 14, 2023Updated 2 years ago
- List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.☆31Apr 13, 2023Updated 2 years ago
- Lyrics and Vocal Melody Generation conditioned on Accompaniment☆29Aug 27, 2022Updated 3 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- ☆24Sep 10, 2025Updated 5 months ago
- PAM is a no-reference audio quality metric for audio generation tasks☆77Jul 19, 2024Updated last year
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …☆57Aug 9, 2025Updated 6 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Mar 4, 2025Updated 11 months ago
- A list of all repositories from Music X Lab☆33Dec 8, 2025Updated 2 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆22Feb 13, 2026Updated 2 weeks ago
- ☆32Nov 25, 2023Updated 2 years ago
- ☆14Updated this week
- Llambada: Simple Text Controllable for accompaniment generation☆37Sep 24, 2025Updated 5 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆63Nov 5, 2025Updated 3 months ago
- The MIR-MLPop dataset and the official implementation of the paper "MIR-MLPop: A Multilingual Pop Music Dataset with Time-Aligned Lyrics …☆33Apr 22, 2024Updated last year
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation☆38Nov 20, 2024Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating …☆95Jun 12, 2025Updated 8 months ago