vikhyat / moondreamLinks
tiny vision language model
☆8,340Updated this week
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,784Updated last week
- Large Action Model framework to develop AI Web Agents☆6,147Updated 7 months ago
- Blazingly fast LLM inference.☆6,027Updated last week
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆17,087Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,363Updated 5 months ago
- Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevan…☆19,463Updated 11 months ago
- A fast multimodal LLM for real-time voice☆4,149Updated this week
- Inference and training library for high-quality TTS models.☆5,385Updated 8 months ago
- Foundational model for human-like, expressive TTS☆4,150Updated last year
- Automate browser-based workflows with LLMs and Computer Vision☆14,089Updated this week
- The #1 open-source voice interface for desktop, mobile, and ESP32 chips.☆5,087Updated 9 months ago
- A framework for Claude Opus to intelligently orchestrate subagents.☆4,277Updated last year
- Open Source framework for voice and multimodal conversational AI☆7,614Updated this week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,626Updated last week
- Letta (formerly MemGPT) is the stateful agents framework with memory, reasoning, and context management.☆17,890Updated last week
- Large World Model -- Modeling Text and Video with Millions Context☆7,327Updated 10 months ago
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,947Updated 3 months ago
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,303Updated last year
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,221Updated last year
- SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability v…☆8,523Updated this week
- A framework to enable multimodal models to operate a computer.☆9,846Updated 3 months ago
- PyTorch native post-training library☆5,418Updated last week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,605Updated last week
- A vector search SQLite extension that runs anywhere!☆6,005Updated 6 months ago
- Tools for merging pretrained large language models.☆6,195Updated last week
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,412Updated 8 months ago
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,156Updated last month
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,082Updated 2 weeks ago
- The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,501Updated last week
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,833Updated 3 months ago