tarzain / crosstalk
a simple system for 2-way interruptible voice interactions between human and LLM
☆28Updated last year
Alternatives and similar repositories for crosstalk
Users that are interested in crosstalk are comparing it to the libraries listed below
Sorting:
- A lightweight Python library for running TTS models with a unified API.☆18Updated 2 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last month
- Open TTS models, built for streaming on the edge☆41Updated last month
- ☆62Updated 9 months ago
- Cog wrapper for collabora/WhisperSpeech☆25Updated last year
- Audio tokenization, in the fastest way possible!☆52Updated 8 months ago
- proof of concept conversation orchestrator with a speech-language model☆19Updated 6 months ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆52Updated 5 months ago
- Speaker Diarization with Transformers☆64Updated 11 months ago
- [WIP] AI Try-On plugin for Chrome☆27Updated last year
- Joint speech-language model - respond directly to audio!☆30Updated last year
- ☆37Updated last year
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆19Updated 7 months ago
- A streaming whisper server for on-prem transcription☆20Updated 9 months ago
- Generate visual podcasts about novels using open source models☆25Updated 2 years ago
- ☆39Updated last year
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆27Updated 7 months ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆18Updated last month
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.☆49Updated this week
- llmon-py is a multimodal webui for Llama 3-8B.☆16Updated 10 months ago
- ☆24Updated last year
- Misc. tools/scripts that I made to use for tortoise☆21Updated 8 months ago
- Arxflix turns your boring Arxiv research paper into a captivating video.☆50Updated 5 months ago
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆73Updated last month
- Experimental sampler to make LLMs more creative☆31Updated last year
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆58Updated last year
- entropix style sampling + GUI☆26Updated 6 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 10 months ago