OpenBMB / MiniCPM-oLinks
MiniCPM-V 4.0: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
☆19,978Updated this week
Alternatives and similar repositories for MiniCPM-o
Users that are interested in MiniCPM-o are comparing it to the libraries listed below
Sorting:
- MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips☆8,160Updated last month
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆23,743Updated last week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆8,734Updated 3 weeks ago
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆11,971Updated 2 months ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆10,686Updated 2 weeks ago
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,301Updated last year
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,378Updated 9 months ago
- "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆19,207Updated this week
- tiny vision language model☆8,275Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆23,215Updated 4 months ago
- Mobile-Agent: The Powerful Mobile Device Operation Assistant Family☆4,546Updated last month
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,452Updated last month
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆18,979Updated this week
- ☆20Updated 11 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,761Updated 5 months ago
- Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding☆4,213Updated 6 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,407Updated 5 months ago
- The Desktop AgentOS.☆7,538Updated this week
- ☆10,824Updated last month
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆9,328Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆55,853Updated this week
- 🚀 全网效果最好的移动端【实时对话数字人】。 支持本地部署、多模态交互(语音、文本、表情),响应速度低于 1.5 秒,适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用,开发者友好。☆7,331Updated this week
- The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.☆6,151Updated last year
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,935Updated 10 months ago
- GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型☆6,790Updated last month
- Official inference repo for FLUX.1 models☆23,889Updated last week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,513Updated last month
- SOTA Open Source TTS☆22,570Updated 2 weeks ago
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (…☆9,054Updated this week
- ☆4,430Updated last month