menloresearch / ichigo
Local realtime voice AI
☆2,279Updated last month
Alternatives and similar repositories for ichigo:
Users that are interested in ichigo are comparing it to the libraries listed below
- A fast multimodal LLM for real-time voice☆3,855Updated 2 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,687Updated last month
- first base model for full-duplex conversational audio☆1,731Updated 3 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,899Updated last week
- Interface for OuteTTS models.☆1,178Updated last week
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit☆762Updated 8 months ago
- Whisper with Medusa heads☆831Updated this week
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,596Updated 8 months ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆996Updated last week
- ☆674Updated last week
- Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯☆833Updated last month
- Local SRT/LLM/TTS Voicechat☆664Updated 6 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆902Updated 5 months ago
- Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗☆630Updated last month
- An Open Source text-to-speech system built by inverting Whisper.☆4,217Updated 2 weeks ago
- turnkey self-hosted offline transcription and diarization service with llm summary☆836Updated 7 months ago
- ☆1,725Updated last week
- Open Source framework for voice and multimodal conversational AI☆5,709Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,989Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,106Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,286Updated 5 months ago
- Everything about the SmolLM2 and SmolVLM family of models☆2,228Updated 3 weeks ago
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though☆541Updated 2 weeks ago
- Implementation of F5-TTS in MLX☆520Updated last month
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,656Updated 8 months ago
- Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer…☆2,008Updated last week
- Converts text to speech in realtime☆2,894Updated this week
- A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech mode…☆964Updated 5 months ago
- Command Your World with Voice☆655Updated 4 months ago
- Inference and training library for high-quality TTS models.☆5,212Updated 4 months ago