vikhyat / moondreamLinks
tiny vision language model
☆8,179Updated 3 weeks ago
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,767Updated this week
- Large World Model -- Modeling Text and Video with Millions Context☆7,300Updated 8 months ago
- Large Action Model framework to develop AI Web Agents☆6,089Updated 5 months ago
- A fast multimodal LLM for real-time voice☆4,087Updated this week
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone☆19,797Updated last week
- ☆2,981Updated 9 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,597Updated this week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,631Updated last year
- Blazingly fast LLM inference.☆5,849Updated this week
- Foundational model for human-like, expressive TTS☆4,135Updated 11 months ago
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,934Updated 2 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,445Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,592Updated last week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,312Updated 3 months ago
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.☆41,657Updated this week
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,491Updated last week
- PyTorch native post-training library☆5,306Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆22,992Updated 11 months ago
- Distribute and run LLMs with a single file.☆22,726Updated last week
- Inference and training library for high-quality TTS models.☆5,336Updated 7 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,604Updated last year
- Python bindings for llama.cpp☆9,313Updated this week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,587Updated last week
- Go ahead and axolotl questions☆9,852Updated this week
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,065Updated this week
- Examples in the MLX framework☆7,632Updated last month
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,828Updated 11 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,195Updated 2 weeks ago
- Everything about the SmolLM2 and SmolVLM family of models☆2,623Updated 2 weeks ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,224Updated 6 months ago