OpenSQZ / MiniCPM-V-CookBookLinks
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o
☆81Updated this week
Alternatives and similar repositories for MiniCPM-V-CookBook
Users that are interested in MiniCPM-V-CookBook are comparing it to the libraries listed below
Sorting:
- GLM Series Edge Models☆147Updated 2 months ago
- ☆237Updated 6 months ago
- A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.☆231Updated 4 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆249Updated 2 weeks ago
- ☆173Updated 6 months ago
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud☆242Updated this week
- 研究GOT-OCR-项目落地加速,不限语言☆61Updated 10 months ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆222Updated last week
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆201Updated 3 weeks ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆214Updated 2 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆277Updated last month
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, and more advanced components to help you build appl…☆115Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆42Updated last month
- [ACL2025 demo track] ROGRAG: A Robustly Optimized GraphRAG Framework☆172Updated last week
- 将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调☆277Updated 3 weeks ago
- Train a Language Model with GRPO to create a schedule from a list of events and priorities☆227Updated 4 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆146Updated 11 months ago
- ☆326Updated last month
- GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆323Updated 3 weeks ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆154Updated last year
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆531Updated 2 months ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆324Updated this week
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆77Updated last year
- ☆168Updated 6 months ago
- ☆292Updated 3 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆247Updated 2 months ago
- Florence-2☆69Updated 6 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆125Updated 9 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆267Updated 7 months ago
- A unified tool to generate fine-tuning datasets for LLMs, including questions, answers, and dialogues. ✨🤖📚💬