OpenBMB / MiniCPM-oLinks

MiniCPM-V 4.0: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

☆19,978

Alternatives and similar repositories for MiniCPM-o

Users that are interested in MiniCPM-o are comparing it to the libraries listed below

Sorting:

OpenBMB / MiniCPM
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
☆8,160Updated last month
QwenLM / Qwen3
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
☆23,743Updated last week
OpenGVLab / InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆8,734Updated 3 weeks ago
QwenLM / Qwen2.5-VL
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆11,971Updated 2 months ago
QwenLM / Qwen-Agent
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆10,686Updated 2 weeks ago
dvlab-research / MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
☆3,301Updated last year
gpt-omni / mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…
☆3,378Updated 9 months ago
HKUDS / LightRAG
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
☆19,207Updated this week
vikhyat / moondream
tiny vision language model
☆8,275Updated this week
microsoft / OmniParser
A simple screen parsing tool towards pure vision based GUI agent
☆23,215Updated 4 months ago
X-PLUG / MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
☆4,546Updated last month
QwenLM / Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆3,452Updated last month
QwenLM / Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
☆18,979Updated this week
mrfinndev / ResumeAI
☆20Updated 11 months ago
Ucas-HaoranWei / GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆7,761Updated 5 months ago
Tencent-Hunyuan / HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
☆4,213Updated 6 months ago
zai-org / CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
☆2,407Updated 5 months ago
microsoft / UFO
The Desktop AgentOS.
☆7,538Updated this week
duixcom / Duix.Heygem
☆10,824Updated last month
dataelement / bisheng
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…
☆9,328Updated this week
hiyouga / LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆55,853Updated this week
duixcom / Duix-Mobile
🚀 全网效果最好的移动端【实时对话数字人】。支持本地部署、多模态交互（语音、文本、表情），响应速度低于 1.5 秒，适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用，开发者友好。
☆7,331Updated this week
QwenLM / Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,151Updated last year
deepseek-ai / DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
☆4,935Updated 10 months ago
zai-org / GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
☆6,790Updated last month
black-forest-labs / flux
Official inference repo for FLUX.1 models
☆23,889Updated last week
InternLM / MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
☆6,513Updated last month
fishaudio / fish-speech
SOTA Open Source TTS
☆22,570Updated 2 weeks ago
modelscope / ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (…
☆9,054Updated this week
stepfun-ai / Step-Audio
☆4,430Updated last month