KoljaB / WhoSpeaks
Efficient approach to speaker diarization using voice characteristics extraction
β92Updated 11 months ago
Alternatives and similar repositories for WhoSpeaks:
Users that are interested in WhoSpeaks are comparing it to the libraries listed below
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β60Updated last week
- β123Updated 3 months ago
- π π€ Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloningβ154Updated 8 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesβ94Updated 10 months ago
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)β71Updated 9 months ago
- β207Updated 5 months ago
- β254Updated last year
- π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.β239Updated 9 months ago
- Simulates talk with an AI that can express emotionsβ58Updated 7 months ago
- Running the F5-TTS by ONNX Runtimeβ123Updated this week
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β71Updated 4 months ago
- β275Updated 9 months ago
- A lightweight end-to-end text-to-speech modelβ110Updated 3 weeks ago
- Open source inference code for Rev's modelβ383Updated 2 weeks ago
- β155Updated last year
- Create an LJSpeech structured voice dataset on wave inputβ26Updated 5 months ago
- G2Pβ171Updated this week
- speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts withβ¦β198Updated last month
- Google's SoundStorm: Efficient Parallel Audio Generationβ131Updated last year
- VoiceBox neural network implementationβ105Updated 7 months ago
- π¬ ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.β205Updated 4 months ago
- On-device speaker diarization powered by deep learningβ39Updated this week
- An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engineβ373Updated 6 months ago
- On-device streaming text-to-speech engine powered by deep learningβ73Updated this week
- A WebRTC server that allows you to interact with an LLM using your speech and responds back with generated audio.β126Updated 9 months ago
- FastAPI service on top of WhisperXβ72Updated this week
- A testing repo to share code and thoughts on diarisationβ53Updated 11 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β32Updated 2 weeks ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β94Updated 5 months ago
- Text to speech alignment using CTC forced alignmentβ233Updated 3 weeks ago