soham97 / ADIFFLinks
Explaining audio differences using language
β16Updated 9 months ago
Alternatives and similar repositories for ADIFF
Users that are interested in ADIFF are comparing it to the libraries listed below
Sorting:
- β43Updated 7 months ago
- [Official Implementation] Acoustic Autoregressive Modeling π₯β73Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)β31Updated 10 months ago
- β24Updated 2 months ago
- β41Updated 7 months ago
- Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"β44Updated 11 months ago
- Source code for the paper 'Audio Captioning Transformer'β57Updated 3 years ago
- Codebase and project page for EDMSoundβ35Updated 2 years ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β48Updated last year
- small audio language model for reasoningβ80Updated 7 months ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesisβ24Updated 7 months ago
- Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, acβ¦β34Updated last year
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillationβ35Updated last year
- Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarkingβ43Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)β42Updated last year
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β30Updated 8 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image β¦β87Updated last year
- β42Updated 2 years ago
- Official implementation of "Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound". IEEE TASLP 20β¦β16Updated 2 months ago
- A spoken version of the textual story cloze benchmarkβ19Updated 2 years ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β38Updated last year
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).β25Updated 2 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.β27Updated last year
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapesβ53Updated last month
- β44Updated 10 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ113Updated 6 months ago
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".β57Updated 2 years ago
- Official Implementation of EnCLAP (ICASSP 2024)β94Updated last year
- Unofficial download repository for MusicCapsβ48Updated 2 years ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialβ¦β40Updated 10 months ago