ErikEkstedt / VoiceActivityProjectionLinks

Voice Activity Projection Models: Self-supervised learning of Turn-taking Events

☆69

Alternatives and similar repositories for VoiceActivityProjection

Users that are interested in VoiceActivityProjection are comparing it to the libraries listed below

Sorting:

BenoitWang / Speech_Emotion_Diarization
☆66Updated 10 months ago
joonaskalda / PixIT
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…
☆95Updated 6 months ago
nii-yamagishilab / ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
☆166Updated last year
Wataru-Nakata / miipher
Unofficial implementation of miipher
☆129Updated last year
BriansIDP / WhisperBiasing
☆81Updated last year
k2-fsa / libriheavy
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
☆197Updated 10 months ago
skit-ai / SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingfac…
☆115Updated last year
backspacetg / simul_whisper
Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
☆68Updated 3 months ago
Takaaki-Saeki / DiscreteSpeechMetrics
Reference-aware automatic speech evaluation toolkit
☆157Updated 7 months ago
marianne-m / brouhaha-vad
Predicts the level of noise and reverberation on your audiofiles
☆153Updated last month
unilight / seq2seq-vc
A sequence-to-sequence voice conversion toolkit.
☆101Updated last year
jasonppy / PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
☆147Updated last year
RevoSpeechTech / speech-datasets-collection
a curated list of speech datasets (110+ datasets, 75+ easy to download)
☆140Updated 2 years ago
yl4579 / PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
☆260Updated 6 months ago
Choddeok / EmoSphere-TTS
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …
☆158Updated last month
desh2608 / diarizer
Clustering-based methods for overlapping diarization
☆81Updated last year
BUTSpeechFIT / TS-ASR-Whisper
☆75Updated last month
FrenchKrab / IS2023-powerset-diarization
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
☆88Updated last year
SpeechColab / GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
☆160Updated 2 weeks ago
yanghaha0908 / FastHuBERT
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
☆94Updated 7 months ago
lingjzhu / clap-ipa
Keyword spotting and forced alignment in any language
☆61Updated last week
neonbjb / tts-scores
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
☆159Updated last year
ankitapasad / layerwise-analysis
Layer-wise analysis of self-supervised pre-trained speech representations
☆108Updated 8 months ago
KunZhou9646 / Mixed_Emotions
☆120Updated 2 years ago
deeplyinc / Nonverbal-Vocalization-Dataset
☆35Updated 3 years ago
wavlab-speech / versa
Versatile Evaluation of Speech and Audio
☆300Updated last week
ga642381 / Speech-Prompts-Adapters
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
☆110Updated last year
imdatceleste / m-ailabs-dataset
This is the M-AILABS Speech Dataset
☆71Updated 7 months ago
mkunes / w2v2_audioFrameClassification
wav2vec2 audio classification for prosodic boundary detection and other tasks
☆43Updated last year
revdotcom / speech-datasets
Various speech datasets made available to the public
☆123Updated 7 months ago