ErikEkstedt / VoiceActivityProjectionLinks
Voice Activity Projection Models: Self-supervised learning of Turn-taking Events
☆80Updated last year
Alternatives and similar repositories for VoiceActivityProjection
Users that are interested in VoiceActivityProjection are comparing it to the libraries listed below
Sorting:
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆97Updated 9 months ago
 - Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆206Updated last year
 - ☆87Updated 3 months ago
 - A sequence-to-sequence voice conversion toolkit.☆103Updated last year
 - Predicts the level of noise and reverberation on your audiofiles☆167Updated 4 months ago
 - Unofficial implementation of miipher☆133Updated last year
 - TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog☆59Updated last year
 - Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆91Updated 2 years ago
 - ☆89Updated last week
 - Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions☆262Updated 9 months ago
 - UTokyo-SaruLab MOS Prediction System☆255Updated 3 weeks ago
 - This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingfac…☆125Updated last year
 - Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆163Updated last year
 - Easy-to-Use Speech MOS predictors☆323Updated 2 years ago
 - ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆177Updated last year
 - Reference-aware automatic speech evaluation toolkit☆167Updated 10 months ago
 - Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection☆94Updated 7 months ago
 - UT-Sarulab MOS prediction system using SSL models☆276Updated last year
 - Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)☆27Updated 2 years ago
 - Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆151Updated last year
 - Multilingual G2P in 100 languages☆361Updated 2 years ago
 - Versatile Evaluation of Speech and Audio☆353Updated last week
 - ☆69Updated last year
 - The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …☆150Updated last month
 - EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆264Updated last year
 - CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆217Updated 3 years ago
 - VoiceBench: Benchmarking LLM-Based Voice Assistants☆298Updated 2 months ago
 - This is the M-AILABS Speech Dataset☆88Updated 11 months ago
 - An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆102Updated last year
 - PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆60Updated last month