kamya-ai / Realtime-speech-detectionLinks
Welcome to the Real-Time Voice Activity Detection (VAD) program, powered by Silero-VAD model! š This program allows you to perform live voice activity detection, detecting when there is speech present in an audio stream and when it goes silent.
ā12Updated 2 years ago
Alternatives and similar repositories for Realtime-speech-detection
Users that are interested in Realtime-speech-detection are comparing it to the libraries listed below
Sorting:
- Open TTS models, built for streaming on the edgeā44Updated 9 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.ā69Updated 2 months ago
- A curated list of awesome voice activity detectionā70Updated last year
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesā100Updated last year
- A Full-Duplex Open-Domain Dialogue Agent with Continuous Turn-Taking Behaviorā36Updated 2 years ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.ā135Updated 2 months ago
- ONNX Inference of Pyannote Segmentationā97Updated last year
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)ā103Updated 4 months ago
- Collection of Open Source Speech Dataā163Updated 2 months ago
- Efficient approach to speaker diarization using voice characteristics extractionā105Updated 6 months ago
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very ā¦ā48Updated last year
- š¼ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionā14Updated last month
- ā175Updated 2 years ago
- Audio tokenization, in the fastest way possible!ā53Updated last year
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" modelsā67Updated 3 years ago
- Speaker diarization serviceā25Updated 6 months ago
- A lightweight end-of-utterance detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices.ā39Updated last month
- A lightweight end-to-end text-to-speech modelā125Updated 10 months ago
- ā58Updated last year
- On-device voice activity detection (VAD) powered by deep learningā238Updated last week
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GPā¦ā39Updated 9 months ago
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Pythonā18Updated 2 years ago
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.ā87Updated this week
- Tunable pipelinesā40Updated 3 months ago
- Joint speech-language model - respond directly to audio!ā30Updated last year
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptionsā52Updated 4 years ago
- Speaker Diarization with Transformersā69Updated 6 months ago
- ā261Updated last year
- VoiceBox neural network implementationā110Updated last year
- This project is about performing Speaker diarization for Hindi Language.ā58Updated 4 years ago