vikhyat / moondreamLinks
tiny vision language model
☆8,073Updated this week
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,460Updated this week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,293Updated this week
- Inference and training library for high-quality TTS models.☆5,303Updated 6 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,574Updated this week
- Blazingly fast LLM inference.☆5,742Updated this week
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,180Updated 5 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,067Updated 2 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,936Updated last month
- DSPy: The framework for programming—not prompting—language models☆25,466Updated this week
- Letta (formerly MemGPT) is the stateful agents framework with memory, reasoning, and context management.☆16,917Updated this week
- 🤗 AutoTrain Advanced☆4,413Updated 5 months ago
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone☆19,629Updated last week
- A fast multimodal LLM for real-time voice☆4,016Updated 4 months ago
- llama3 implementation one matrix multiplication at a time☆15,001Updated last year
- ☆8,472Updated last year
- Open Source framework for voice and multimodal conversational AI☆6,517Updated this week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆24,159Updated this week
- A powerful framework for building realtime voice AI agents 🤖🎙️📹☆6,325Updated this week
- Go ahead and axolotl questions☆9,610Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆15,276Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆8,862Updated last month
- Memory for AI Agents; Announcing OpenMemory MCP - local and secure memory management.☆34,513Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆22,795Updated 10 months ago
- Foundational model for human-like, expressive TTS☆4,132Updated 10 months ago
- The python library for real-time communication☆4,037Updated last week
- Tensor library for machine learning☆12,697Updated last week
- Full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning.☆28,467Updated this week
- Large World Model -- Modeling Text and Video with Millions Context☆7,293Updated 8 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,292Updated 3 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,589Updated last year