soham97 / ADIFFLinks
Explaining audio differences using language
β16Updated 11 months ago
Alternatives and similar repositories for ADIFF
Users that are interested in ADIFF are comparing it to the libraries listed below
Sorting:
- [Official Implementation] Acoustic Autoregressive Modeling π₯β74Updated last year
- β47Updated 9 months ago
- β40Updated 9 months ago
- small audio language model for reasoningβ86Updated last month
- β24Updated 4 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β48Updated last week
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)β32Updated last year
- Source code for the paper 'Audio Captioning Transformer'β57Updated 4 years ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ59Updated last year
- A spoken version of the textual story cloze benchmarkβ20Updated 2 years ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β32Updated 10 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)β43Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β38Updated 2 years ago
- Codebase and project page for EDMSoundβ35Updated 2 years ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"β106Updated last month
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapesβ57Updated 3 months ago
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillationβ38Updated last year
- β43Updated last year
- Implementation of Multi-Source Music Generation with Latent Diffusion.β28Updated last year
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)β40Updated 5 months ago
- Pytorch implementation for βV2C: Visual Voice Cloningββ32Updated 3 years ago
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"β95Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ118Updated 8 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialβ¦β40Updated last year
- Official implementation for FlowSepβ69Updated last year
- Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, acβ¦β34Updated last year
- β32Updated last month
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024β32Updated 10 months ago
- β18Updated 8 months ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesisβ41Updated 9 months ago