This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
☆248Nov 24, 2025Updated 5 months ago
Alternatives and similar repositories for On-Device-Speech-to-Speech-Conversational-AI
Users that are interested in On-Device-Speech-to-Speech-Conversational-AI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train and finutune text-to-speech models for Bengali and many other languages!☆18Apr 2, 2025Updated last year
- List of curated use cases built using Sesame's CSM 1B☆72May 29, 2025Updated 11 months ago
- Fine tuned llama 3 models for context based question answering in bengali language.☆18Oct 14, 2024Updated last year
- High-performance, semantic turn detection for conversational AI☆37Oct 1, 2025Updated 7 months ago
- ☆59Feb 8, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Orpheus-TTS local speech synthesizer written entirely in C#☆30Nov 25, 2025Updated 5 months ago
- ☆35Oct 23, 2025Updated 6 months ago
- Training code for kokoro tts model☆39Nov 15, 2025Updated 5 months ago
- Bangla PDF to text converter that works on Windows, macOS, and Linux without any extra downloads or configurations.☆21Oct 12, 2024Updated last year
- A Python client for Deepgram's Voice Agent API☆10Oct 14, 2025Updated 6 months ago
- ☆19Nov 30, 2024Updated last year
- StyleTTS2 + Vocos as a Decoder☆13Mar 24, 2025Updated last year
- Text-audio foundation model from Boson AI☆119Sep 4, 2025Updated 8 months ago
- Docker image for Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation☆11Apr 14, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A fourier-based audio-synthesiser wrote in MATLAB as a university project.☆12Jan 19, 2019Updated 7 years ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆95Dec 3, 2024Updated last year
- A Conversational Speech Generation Model☆14,616May 27, 2025Updated 11 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆57May 17, 2025Updated 11 months ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆31Sep 20, 2025Updated 7 months ago
- ☆54Dec 5, 2025Updated 5 months ago
- Towards Human-Sounding Speech☆6,127Dec 5, 2025Updated 5 months ago
- Official implementation of the TTS model Lina-Speech☆178Jan 9, 2025Updated last year
- Realtime demo, Streaming and Finetuning code for CSM☆455Sep 17, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆14Nov 15, 2025Updated 5 months ago
- This is a side project where me and my friend try to generate synthetic data in bangla from deepseek-r1. So that can be used for model di…☆11Jun 28, 2025Updated 10 months ago
- Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching☆4,811Jan 4, 2026Updated 4 months ago
- ☆20Feb 14, 2026Updated 2 months ago
- This comprehensive guide provides a universal process for preparing your own speech datasets and training a custom Text-to-Speech (TTS) m…☆27May 3, 2025Updated last year
- Quickly create a Simli AI Agent☆21Feb 18, 2026Updated 2 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆70Nov 1, 2024Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆10,111Apr 28, 2026Updated last week
- Fast audio super resolution from 16khz to 48khz.☆207Jan 3, 2026Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets☆137Aug 10, 2025Updated 9 months ago
- Update ASR paper everyday☆510May 3, 2026Updated last week
- An open source real-time AI inference engine for seamless scaling☆23Jul 2, 2025Updated 10 months ago
- Fine-tune Bangla ASR model which was trained Bangla Mozilla Common Voice Dataset☆12Apr 16, 2024Updated 2 years ago
- High quality text-to-speech based on StyleTTS 2.☆78Apr 6, 2026Updated last month
- A local, voice-controlled AI assistant with the personality of HAL 9000 from 2001: A Space Odyssey.☆26Aug 16, 2025Updated 8 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆314May 31, 2025Updated 11 months ago