OpenSQZ / MiniCPM-V-CookBookLinks
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o
☆290Updated this week
Alternatives and similar repositories for MiniCPM-V-CookBook
Users that are interested in MiniCPM-V-CookBook are comparing it to the libraries listed below
Sorting:
- ☆242Updated 11 months ago
- ☆194Updated 2 months ago
- GLM Series Edge Models☆157Updated 7 months ago
- ☆187Updated last year
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆284Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆46Updated 4 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆270Updated 2 weeks ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆299Updated 7 months ago
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, Monaco Editor and more advanced components to help…☆135Updated 2 months ago
- ☆684Updated last month
- xllamacpp - a Python wrapper of llama.cpp☆72Updated last week
- ☆341Updated 3 months ago
- ☆985Updated this week
- ☆185Updated last year
- GLM-OCR: Accurate × Fast × Comprehensive☆505Updated this week
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆81Updated last year
- ☆147Updated 6 months ago
- ☆520Updated last month
- Port of Facebook's LLaMA model in C/C++☆67Updated 9 months ago
- A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.☆248Updated 9 months ago
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆376Updated 3 months ago
- ☆133Updated 10 months ago
- [EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation☆128Updated 2 months ago
- Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"☆148Updated 8 months ago
- 研究GOT-OCR-项目落地加速,不限语言☆62Updated last year
- Youtu-Embedding is an industry-leading, general-purpose text representation model developed by Tencent Youtu Lab.☆174Updated 2 months ago
- project page for ChatAnyone☆116Updated 10 months ago
- ☆161Updated 5 months ago
- ☆72Updated 2 months ago
- [AAAI 2026 🔥 Poster] ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning☆321Updated 5 months ago