mhamilton723 / DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
☆61Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for DenseAV
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆66Updated last week
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆101Updated last week
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆77Updated 5 months ago
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorch☆11Updated this week
- Implementation of the proposed MaskBit from Bytedance AI☆62Updated last week
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆192Updated last month
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆80Updated this week
- The demo page of UniAudio☆34Updated 9 months ago
- ☆61Updated 3 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 2 months ago
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆15Updated last month
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆109Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated last week
- The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tu…☆71Updated 2 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆43Updated last month
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆104Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆29Updated last month
- ☆84Updated 7 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 5 months ago
- Codebase and project page for EDMSound☆29Updated last year
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression☆13Updated last month
- ☆47Updated 4 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…☆156Updated 7 months ago
- The official implementation of the IJCAI 2024 paper "MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models".☆31Updated 2 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆32Updated last month
- SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization (Interspeech 2024)☆26Updated 3 weeks ago
- This repository aims to collect Transformer-based sound event detection (SED) algorithms.☆36Updated 3 weeks ago
- The official Implementation of PeriodWave and PeriodWave-Turbo☆132Updated 3 months ago
- Splits for epic-sounds dataset☆70Updated 8 months ago