vikhyat / moondreamLinks
tiny vision language model
☆8,863Updated last month
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,643Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,617Updated last month
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,073Updated this week
- Everything about the SmolLM and SmolVLM family of models☆3,369Updated last month
- MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone☆22,170Updated last month
- Large World Model -- Modeling Text and Video with Millions Context☆7,363Updated last year
- Local AI API Platform☆2,761Updated 4 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,424Updated 7 months ago
- Foundational model for human-like, expressive TTS☆4,194Updated last year
- A fast multimodal LLM for real-time voice☆4,243Updated 2 months ago
- Blazingly fast LLM inference.☆6,189Updated this week
- Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM☆1,447Updated 7 months ago
- Examples in the MLX framework☆7,960Updated 3 weeks ago
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,323Updated last year
- Official inference repo for FLUX.1 models☆24,577Updated 3 months ago
- ☆8,654Updated last year
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,687Updated last year
- Inference and training library for high-quality TTS models.☆5,458Updated 10 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,374Updated 3 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,946Updated 2 weeks ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,137Updated 2 weeks ago
- Yes, it's another chat over documents implementation... but this one is entirely local!☆1,808Updated 7 months ago
- The #1 open-source voice interface for desktop, mobile, and ESP32 chips.☆5,092Updated last year
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,084Updated 5 months ago
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,252Updated 8 months ago
- A vector search SQLite extension that runs anywhere!☆6,359Updated 9 months ago
- Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.☆3,681Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆18,813Updated 2 weeks ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,602Updated last week
- ☆3,035Updated last year