douglas125 / SpeechIdentityLinks
Identity verification from speech
☆19Updated 2 years ago
Alternatives and similar repositories for SpeechIdentity
Users that are interested in SpeechIdentity are comparing it to the libraries listed below
Sorting:
- Efficient approach to speaker diarization using voice characteristics extraction☆97Updated last week
- Reproducible experimental protocols for multimedia (audio, video, text) database☆102Updated 4 months ago
- This is the audio sample repository for speech separation model "MossFormer2".☆133Updated 7 months ago
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆149Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 3 weeks ago
- ☆296Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆239Updated 3 months ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆94Updated 5 months ago
- Building a Deep learning model that predicts the gender of a speaker using TensorFlow 2☆126Updated 2 years ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆171Updated last month
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆137Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆84Updated last year
- Create an LJSpeech structured voice dataset on wave input☆30Updated 9 months ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- On-device voice activity detection (VAD) powered by deep learning☆218Updated last week
- Voice Activity Projection Models: Self-supervised learning of Turn-taking Events☆69Updated last year
- Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection☆63Updated 2 months ago
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆312Updated 2 years ago
- Speaker diarization model☆27Updated 2 years ago
- Speaker Diarization with Transformers☆68Updated 2 weeks ago
- Update ASR paper everyday☆249Updated this week
- 🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.☆245Updated last year
- Auto-AVSR: Lip-Reading Sentences Project☆349Updated 5 months ago
- Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions☆258Updated 5 months ago
- Transcription and diarization (speaker identification)☆33Updated 2 years ago
- ONNX Inference of Pyannote Segmentation☆91Updated 6 months ago
- ☆73Updated last week
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆64Updated 2 years ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆163Updated last year
- 🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation☆254Updated last year