wanglin-lw / ST-Caps
☆11Updated 2 years ago
Alternatives and similar repositories for ST-Caps:
Users that are interested in ST-Caps are comparing it to the libraries listed below
- PyTorch Implementation of SimulLR☆11Updated 3 years ago
- ☆18Updated last year
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆35Updated 7 months ago
- ☆161Updated 9 months ago
- [CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.☆105Updated 10 months ago
- Code for LAVSS: Location-Guided Audio-Visual Spatial Audio Separation☆12Updated 2 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆33Updated 6 months ago
- Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence☆18Updated 10 months ago
- This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".☆55Updated 4 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆126Updated 4 months ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆222Updated 9 months ago
- A curated list of audio-visual learning methods and datasets.☆255Updated 4 months ago
- This repository includes the code to reproduce our paper "Automatic speaker verification spoofing and deepfake detection using wav2vec 2.…☆129Updated last year
- A list of tools, papers and code related to Fake Audio Detection.☆93Updated this week
- ☆22Updated last year
- This package aims at simplifying the download of the AudioCaps dataset.☆33Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆144Updated 4 months ago
- ☆27Updated last year
- ☆131Updated 2 years ago
- Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)☆116Updated last year
- Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)☆13Updated last month
- cross modal background suppression for audio-visual event localization☆35Updated 3 years ago
- Voice Face Association Learning Paper List☆15Updated last year
- This repository includes the code to reproduce our paper "RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Spea…☆60Updated last year
- ☆13Updated 9 months ago
- ☆48Updated 7 months ago
- ☆14Updated last year
- ☆46Updated 9 months ago
- Code for the InterSpeech 2023 paper: MMER: Multimodal Multi-task learning for Speech Emotion Recognition☆73Updated last year
- Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition☆150Updated 3 years ago