SesameAILabs / whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆52Updated 6 months ago
Alternatives and similar repositories for whisperX
Users that are interested in whisperX are comparing it to the libraries listed below
Sorting:
- Faster Whisper with additional features☆44Updated 2 months ago
- List of curated use cases built using Sesame's CSM 1B☆67Updated 2 months ago
- A cutting-edge Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.☆87Updated 2 weeks ago
- Sesame Converse - Real Time Conversations - Powered by Gemma 3☆61Updated 2 months ago
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆75Updated last month
- ☆60Updated last month
- Record and stream WAV audio data in the browser across all platforms☆31Updated 3 months ago
- This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming archi…☆118Updated last month
- Sesame CSM 1B Voice Cloning☆297Updated 2 months ago
- Orpheus Chat WebUI☆55Updated last month
- A lightweight recreation of OS1/Samantha from the movie Her, running locally in the browser☆96Updated 3 weeks ago
- A simple voice assistant example built with Next.js and LiveKit React Components☆166Updated this week
- Chat Application Starter Kit — Gemini Multimodal Live API + Pipecat☆190Updated 2 months ago
- deep hermes, but decides how to respond based on its OWN decision, no need for system prompts.☆35Updated last month
- Open source tool for transcirption and subtitling, alternative to happyscribe.☆26Updated 3 months ago
- Since the owner of the repo took it down and it used an MIT license, I guess it's okay to upload it here for people to use.☆39Updated 2 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆20Updated last week
- OpenAI compatible TTS for Sesame CSM:1b & dia:1.6b - Voice Cloning from File/YT☆328Updated 3 weeks ago
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆184Updated last week
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆202Updated last month
- Run Orpheus 3B Locally With LM Studio☆401Updated last month
- High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs.☆353Updated last month
- Win & Liunux Gradio WebUI for CSM-1B model by sesame☆43Updated 2 months ago
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…☆23Updated last month
- The agentic video editing framework☆117Updated 3 months ago
- An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.☆196Updated this week
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆93Updated last month
- A local implementation of the Kokoro Text-to-Speech model, featuring dynamic module loading, automatic dependency management, and a web i…☆176Updated this week
- Insanely Fast Transcription: A Python-based utility for rapid audio transcription from YouTube videos or local files. Leverages GPU accel…☆82Updated 9 months ago
- 100% Local Document deep search with LLMs☆26Updated 8 months ago