soham97 / ADIFFLinks
Explaining audio differences using language
β16Updated 11 months ago
Alternatives and similar repositories for ADIFF
Users that are interested in ADIFF are comparing it to the libraries listed below
Sorting:
- [Official Implementation] Acoustic Autoregressive Modeling π₯β74Updated last year
- β49Updated 9 months ago
- β40Updated 10 months ago
- β23Updated 4 months ago
- Codebase and project page for EDMSoundβ35Updated 2 years ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)β32Updated last year
- Official Implementation of EnCLAP (ICASSP 2024)β94Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)β43Updated last year
- small audio language model for reasoningβ86Updated 2 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β32Updated 11 months ago
- β43Updated last year
- Implementation of Multi-Source Music Generation with Latent Diffusion.β28Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β48Updated 2 weeks ago
- Dataset/code for AudioMarkBench: Benchmarking Robustness of Audio Watermarkingβ44Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ118Updated 8 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistencyβ59Updated last year
- Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, acβ¦β34Updated last year
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).β25Updated 4 months ago
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillationβ38Updated last year
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"β97Updated last year
- Source code for the paper 'Audio Captioning Transformer'β57Updated 4 years ago
- β18Updated 9 months ago
- Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectioβ¦β19Updated 3 months ago
- Unofficial download repository for MusicCapsβ47Updated 2 years ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image β¦β88Updated last year
- β14Updated 6 months ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learningβ53Updated 2 years ago
- The official implementation of the IJCAI 2024 paper "MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models".β47Updated last year
- Pytorch implementation for βV2C: Visual Voice Cloningββ33Updated 3 years ago
- Official implementation for FlowSepβ69Updated last year