soham97 / ADIFFLinks

Explaining audio differences using language

☆15

Alternatives and similar repositories for ADIFF

Users that are interested in ADIFF are comparing it to the libraries listed below

Sorting:

qiuk2 / AAR
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆71Updated last year
snap-research / GenAU
☆41Updated 6 months ago
soham97 / mellow
small audio language model for reasoning
☆78Updated 6 months ago
lavendery / AudioComposer
☆23Updated last month
AgentCooper2002 / EDMSound
Codebase and project page for EDMSound
☆35Updated last year
jaeyeonkim99 / EnCLAP
Official Implementation of EnCLAP (ICASSP 2024)
☆94Updated last year
guyyariv / AudioToken
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …
☆85Updated last year
NKU-HLT / AudioEditor
☆38Updated 6 months ago
FreedomIntelligence / FusionAudio
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues
☆81Updated 3 weeks ago
ilpoviertola / V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆29Updated 9 months ago
gzhu06 / Cacophony
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆48Updated last year
gwh22 / LAFMA
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆40Updated last year
naver-ai / rewas
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆44Updated 10 months ago
XZWY / MSLDM
Implementation of Multi-Source Music Generation with Latent Diffusion.
☆26Updated last year
Text-to-Audio / Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆111Updated 5 months ago
Bai-YT / ConsistencyTTA
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
☆35Updated 11 months ago
Hannieliao / Baton
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆29Updated 7 months ago
audio-captioning / caption-evaluation-tools
Tools for the evaluation of audio captioning.
☆18Updated 5 years ago
GalaxyCong / StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
☆91Updated 11 months ago
slp-rl / SpokenStoryCloze
A spoken version of the textual story cloze benchmark
☆19Updated 2 years ago
justivanr / art2mus_
Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…
☆18Updated 10 months ago
ta012 / SSLAM
[ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
☆49Updated last week
AI-S2-Lab / FluentEditor
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆55Updated 11 months ago
zeyuxie29 / PicoAudio
☆43Updated 9 months ago
chenqi008 / V2C
Pytorch implementation for “V2C: Visual Voice Cloning”
☆31Updated 2 years ago
andybi7676 / reborn-uasr
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
☆13Updated 10 months ago
kaistmm / VoiceDiT
[ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
☆24Updated 6 months ago
sungnyun / ARMHuBERT
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆40Updated last year
XinhaoMei / ACT
Source code for the paper 'Audio Captioning Transformer'
☆57Updated 3 years ago
slSeanWU / beats-conformer-bart-audio-captioner
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆38Updated last year