kyutai-labs / moshi
☆6,670Updated last week
Related projects ⓘ
Alternatives and complementary repositories for moshi
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,532Updated last month
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,488Updated last week
- Inference and training library for high-quality TTS models.☆4,592Updated last week
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,084Updated this week
- Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…☆7,394Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,065Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆6,785Updated this week
- Open Source framework for voice and multimodal conversational AI☆3,339Updated this week
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆4,747Updated 3 months ago
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆5,094Updated this week
- Foundational model for human-like, expressive TTS☆3,873Updated 3 months ago
- tiny vision language model☆5,600Updated this week
- Llama3.1 learns to Listen☆1,702Updated this week
- MARS5 speech model (TTS) from CAMB.AI☆2,527Updated 3 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆7,626Updated 4 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,356Updated this week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆3,854Updated last month
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆5,879Updated this week
- Composable building blocks to build Llama Apps☆4,477Updated this week
- Get your documents ready for gen AI☆7,243Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆4,410Updated this week
- Build real-time multimodal AI applications 🤖🎙️📹☆3,913Updated this week
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,604Updated last week
- An Open Source text-to-speech system built by inverting Whisper.☆3,956Updated 4 months ago
- A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.☆6,752Updated last week
- first base model for full-duplex conversational audio☆1,248Updated this week
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆11,051Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,150Updated this week
- Multilingual Voice Understanding Model☆3,349Updated 3 weeks ago
- A language model programming library.☆5,208Updated this week