mhamilton723 / DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
☆53Updated 3 months ago
Related projects: ⓘ
- ☆61Updated last month
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆108Updated last year
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆169Updated last month
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- The demo page of UniAudio☆34Updated 7 months ago
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆141Updated 5 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆75Updated 3 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆43Updated 3 weeks ago
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆79Updated 2 weeks ago
- The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tu…☆65Updated 2 weeks ago
- Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stabili…☆111Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆22Updated this week
- Long-Term Rhythmic Video Soundtracker, ICML2023☆54Updated 2 months ago
- ☆84Updated 5 months ago
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆85Updated 2 months ago
- ☆48Updated last month
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆38Updated last year
- The repo host the code and model of MAViL.☆41Updated last year
- Code for the C2KD paper (ICASSP 2023)☆16Updated last year
- Code for the paper Real-Time Neural Voice Camouflage☆28Updated 2 years ago
- Official Implementation of EnCLAP (ICASSP 2024)☆88Updated 3 months ago
- Splits for epic-sounds dataset☆68Updated 6 months ago
- This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional R…☆35Updated last month
- Unsupervised Rhythm Modeling for Voice Conversion☆78Updated last year
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".☆55Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆87Updated 3 weeks ago
- Implementation of a multimodal diffusion transformer in Pytorch☆92Updated 2 months ago
- The official Implementation of PeriodWave and PeriodWave-Turbo☆107Updated last month
- Efficient synchronization from sparse cues☆25Updated 4 months ago
- ☆57Updated 2 years ago