burchim / AVEC
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
☆88Updated last year
Related projects: ⓘ
- Official implement of SpeechFormer written in Python (PyTorch).☆72Updated last year
- Auto-AVSR: Lip-Reading Sentences Project☆164Updated 5 months ago
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆123Updated 3 months ago
- Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)☆51Updated 2 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆99Updated 5 months ago
- DWFormer: Dynamic Window Transformer for Speech Emotion Recognition(ICASSP 2023 Oral)☆45Updated 2 months ago
- ☆130Updated 2 months ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆33Updated 7 months ago
- ☆114Updated 2 weeks ago
- Official PyTorch implementation of paper Leveraging Unimodal Self Supervised Learning for Multimodal Audio-Visual Speech Recognition (ACL…☆60Updated 2 years ago
- Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition☆136Updated 2 years ago
- ☆43Updated last year
- ☆93Updated 2 years ago
- [ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations☆33Updated 9 months ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆56Updated last year
- SpeechFormer++ in PyTorch☆38Updated last year
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)☆95Updated 5 months ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆28Updated last year
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆21Updated 4 months ago
- ☆49Updated last week
- ☆134Updated last year
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆21Updated last month
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆39Updated 2 weeks ago
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆85Updated 2 months ago
- [ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech E…☆157Updated 4 months ago
- Research code for the paper "Fine-tuning wav2vec2 for speaker recognition" found at https://arxiv.org/abs/2109.15053☆140Updated 2 years ago
- Official implement of "Dual-stream Time-Delay Neural Network with Dynamic Global Filter for Speaker Verification" in PyTorch☆38Updated last year
- PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)☆24Updated 6 months ago
- ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'☆85Updated last year