nttcslab-sp / agevoxceleb
☆22Updated 2 years ago
Related projects: ⓘ
- ☆26Updated last year
- End-to-end diarization loss☆19Updated 3 years ago
- Development Toolkit for the VoxCeleb Speaker Recognition Challenge 2020☆42Updated 4 years ago
- MultiSV: scripts for data preparation☆24Updated 3 months ago
- AudioVisual Diarization - Supervised and Unsupervised☆13Updated last year
- Code for the paper: "Leveraging speaker attribute information using multi task learning for speaker verification and diarization" present…☆24Updated last year
- Transformer-based online speech recognition system with TensorFlow 2☆25Updated 3 years ago
- A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification☆31Updated 3 years ago
- Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.☆17Updated 2 years ago
- Multipurpose Multi Speaker Mixture Signal Generator☆43Updated 6 months ago
- Streaming Audiotransformers for online Audio tagging☆39Updated 3 months ago
- ☆28Updated 2 years ago
- ☆54Updated 3 years ago
- ☆35Updated 2 years ago
- Contains code for Deep Self Supervised Heirarchical Clustering for Speaker Diarization☆16Updated 2 years ago
- Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion☆39Updated last year
- PyTorch implementation for Deep Griffin-Lim Iteration paper(https://arxiv.org/abs/1903.03971)☆36Updated 4 years ago
- ☆13Updated 2 years ago
- Code and data repository for paper "VoxCeleb enrichment for Age and Gender recognition" submitted at ASRU 2021☆63Updated 2 years ago
- Efficient Speech Processing Tookit for Automatic Speaker Recognition☆17Updated last year
- ☆52Updated 3 years ago
- Discriminative Condition-Aware PLDA☆42Updated last month
- An evaluation toolkit for voice conversion models.☆39Updated 3 years ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.☆33Updated 7 months ago
- Balanced Error Rate for Speaker Diarization☆25Updated last year
- A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK☆60Updated 2 years ago
- Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021☆39Updated 3 years ago
- Speechflow for emotion recognition related information decomposition☆9Updated 3 years ago
- ☆41Updated 10 months ago
- Source Code for the Paper "UNIFIED KEYWORD SPOTTING AND AUDIO TAGGING ON MOBILE DEVICES WITH TRANSFORMERS"☆23Updated last year