OpenSQZ/MiniCPM-V-CookBook

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenSQZ/MiniCPM-V-CookBook)

OpenSQZ / MiniCPM-V-CookBook

Cook up amazing AI applications effortlessly with MiniCPM / MiniCPM-V / MiniCPM-o

☆606

Alternatives and similar repositories for MiniCPM-V-CookBook

Users that are interested in MiniCPM-V-CookBook are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tc-mb / llama.cpp-omni
View on GitHub
Omni inference in C/C++
☆208Updated this week
OpenBMB / MiniCPM-o-Demo
View on GitHub
Official PyTorch+CUDA Full-functional Web Demo for MiniCPM-o 4.5
☆280Updated this week
THUMAI-Lab / LLaVA-UHD-v4
View on GitHub
☆46Jun 7, 2026Updated last month
OpenBMB / MiniCPM-V-Apps
View on GitHub
MiniCPM-V apps — fully offline multimodal chat on iOS / Android / HarmonyOS
☆348Jul 10, 2026Updated last week
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,965Jun 25, 2026Updated 3 weeks ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
OpenBMB / MiniCPM
View on GitHub
MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.
☆9,986Jun 20, 2026Updated last month
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,356Updated this week
OpenBMB / AgentCPM
View on GitHub
An End-to-End Infrastructure for Training and Evaluating Various LLM Agents
☆812Feb 9, 2026Updated 5 months ago
OpenBMB / RLPR
View on GitHub
Extrapolating RLVR to General Domains without Verifiers
☆205Aug 12, 2025Updated 11 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,146Updated this week
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,900Apr 23, 2026Updated 2 months ago
tc-mb / llama.cpp
View on GitHub
Port of Facebook's LLaMA model in C/C++
☆115Jun 23, 2026Updated 3 weeks ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,637Jan 30, 2026Updated 5 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,582Jun 14, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
OpenBMB / AgentCPM-GUI
View on GitHub
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient…
☆1,392Jan 11, 2026Updated 6 months ago
Kwai-Keye / Keye
View on GitHub
☆806Jun 10, 2026Updated last month
VITA-MLLM / VITA
View on GitHub
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,520Mar 28, 2025Updated last year
vllm-project / vllm-omni
View on GitHub
A framework for efficient model inference with omni-modality models
☆5,643Updated this week
yaolinli / TimeChat-Captioner
View on GitHub
[ICML 2026] Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
☆48Jun 29, 2026Updated 3 weeks ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,887Updated this week
mit-han-lab / streaming-vlm
View on GitHub
StreamingVLM: Real-Time Understanding for Infinite Video Streams
☆1,046Oct 15, 2025Updated 9 months ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
HumanAIGC-Engineering / OpenAvatarChat
View on GitHub
☆3,636Jun 9, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
allenai / molmo2
View on GitHub
Code for the Molmo2 Vision-Language Model
☆693Mar 18, 2026Updated 4 months ago
Tongyi-MAI / MAI-UI
View on GitHub
MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B
☆1,823Apr 20, 2026Updated 3 months ago
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,666Updated this week
OpenSQZ / MegatronApp
View on GitHub
Toolchain built around the Megatron-LM for Distributed Training
☆97May 20, 2026Updated 2 months ago
YIGE24 / StreamingTOM
View on GitHub
☆26Mar 5, 2026Updated 4 months ago
XiaomiMiMo / MiMo-VL
View on GitHub
MiMo-VL
☆642Aug 21, 2025Updated 11 months ago
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
QwenLM / Qwen2.5-Omni
View on GitHub
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆4,042Jun 12, 2025Updated last year
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,691Feb 27, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,323May 25, 2026Updated last month
FunAudioLLM / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,414Updated this week
xiaomi-research / timeviper
View on GitHub
[CVPR'26] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
☆25Jan 4, 2026Updated 6 months ago
RLHF-V / RLAIF-V
View on GitHub
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆456May 14, 2025Updated last year
MBZUAI-IFM / K2-Think-SFT
View on GitHub
☆131Sep 9, 2025Updated 10 months ago
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,804Updated this week
BytedanceDouyinContent / SAIL-VL2
View on GitHub
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆79Sep 18, 2025Updated 10 months ago