cvlab-columbia / voicecamo
Code for the paper Real-Time Neural Voice Camouflage
☆28Updated 2 years ago
Related projects: ⓘ
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆108Updated last year
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆38Updated last year
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆75Updated 3 months ago
- ☆14Updated last year
- Efficient synchronization from sparse cues☆25Updated 4 months ago
- Facestar dataset. High quality audio-visual recordings of human conversational speech.☆99Updated 2 years ago
- Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization (ICCV 2021)☆9Updated 2 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆76Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆44Updated 5 months ago
- Codebase and project page for EDMSound☆29Updated 10 months ago
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)☆50Updated 7 months ago
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".☆55Updated last year
- ☆44Updated this week
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach☆20Updated 3 years ago
- Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)☆60Updated 3 years ago
- Source code for the paper 'Audio Captioning Transformer'☆47Updated 2 years ago
- Evaluation script for VoxMovies dataset in PyTorch☆22Updated 8 months ago
- ☆44Updated 2 months ago
- Demo for 2022 ICASSP☆64Updated 2 years ago
- Repo for Visual Acoustic Matching, CVPR 2022☆60Updated last year
- Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)☆13Updated last year
- ☆11Updated 2 months ago
- ☆57Updated 2 years ago
- ☆33Updated 2 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- Official release of StyleTalk dataset.☆53Updated 2 months ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆34Updated 8 months ago
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation☆27Updated 3 months ago
- ☆43Updated last year
- ☆23Updated last month