W-Wu / DEER
☆11Updated last year
Alternatives and similar repositories for DEER:
Users that are interested in DEER are comparing it to the libraries listed below
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆41Updated 2 weeks ago
- Source code for the paper 'Audio Captioning Transformer'☆52Updated 3 years ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆30Updated last year
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆11Updated 8 months ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆9Updated last year
- ☆22Updated 10 months ago
- ☆37Updated last year
- Pytorch implementation for “V2C: Visual Voice Cloning”☆29Updated 2 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆38Updated 5 months ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆15Updated 2 months ago
- Official Implementation of "Inference and Denoise: Causal Inference-based Neural Speech Enhancement"☆27Updated last year
- ☆15Updated 9 months ago
- SRTNet☆24Updated last year
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆47Updated last year
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆11Updated 7 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆38Updated 3 months ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆30Updated last year
- ☆36Updated 2 years ago
- Transformer-based visually grounded speech models☆19Updated 2 years ago
- Official implementation for MGN☆20Updated 2 years ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆32Updated 4 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'☆43Updated 2 years ago
- ☆17Updated last year
- ☆19Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆52Updated 2 months ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆25Updated 4 months ago
- A spoken version of the textual story cloze benchmark☆14Updated last year
- Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset☆12Updated last week
- The official implementation of OpenSR (ACL2023 Oral)☆15Updated last year
- Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"☆19Updated last year