wanglin-lw / ST-Caps
☆11Updated last year
Related projects: ⓘ
- PyTorch Implementation of SimulLR☆11Updated 2 years ago
- ☆14Updated 6 months ago
- av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models☆14Updated 4 months ago
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆20Updated 2 weeks ago
- ☆130Updated 2 months ago
- ☆22Updated 5 months ago
- [CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.☆99Updated 3 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆21Updated last month
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆99Updated 5 months ago
- cross modal background suppression for audio-visual event localization☆34Updated 2 years ago
- Toolkits for Multimodal Emotion Recognition☆150Updated 3 months ago
- soundnet and localize sound source☆12Updated 3 years ago
- A curated list of audio-visual learning methods and datasets.☆220Updated last week
- Official implement of SpeechFormer written in Python (PyTorch).☆72Updated last year
- Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"☆15Updated 11 months ago
- Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition☆62Updated 6 months ago
- Auto-AVSR: Lip-Reading Sentences Project☆164Updated 5 months ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆28Updated last year
- A Vector Quantized Masked AutoEncoder for speech emotion recognition☆15Updated 6 months ago
- This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".☆29Updated 4 months ago
- ☆13Updated 2 months ago
- Deformable Speech Transformer (DST)☆26Updated last month
- Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information☆121Updated 9 months ago
- FRAME-LEVEL EMOTIONAL STATE ALIGNMENT METHOD FOR SPEECH EMOTION RECOGNITION☆13Updated 10 months ago
- Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…☆13Updated 2 years ago
- Voice Face Association Learning Paper List☆12Updated last year
- Code and generated sounds for "Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning", MLSP 2021☆68Updated 3 years ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆194Updated last month
- ☆21Updated 8 months ago
- Research progress on speech deepfake detection: Relevant datasets aggregated from the review literature and publicly available codes☆84Updated last year