OpenBMB / MiniCPM-oLinks
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
☆19,688Updated this week
Alternatives and similar repositories for MiniCPM-o
Users that are interested in MiniCPM-o are comparing it to the libraries listed below
Sorting:
- MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips☆7,999Updated this week
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆11,094Updated last month
- A simple screen parsing tool towards pure vision based GUI agent☆22,487Updated 2 months ago
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆52,785Updated this week
- 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming☆56,520Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆50,358Updated this week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆8,397Updated 3 weeks ago
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆18,538Updated last week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆9,608Updated this week
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆8,903Updated this week
- GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型☆6,630Updated last week
- "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆17,685Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆15,276Updated this week
- Toolkit for linearizing PDFs for LLM datasets/training☆13,006Updated this week
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。☆35,508Updated this week
- A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.☆14,765Updated this week
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆14,423Updated last week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆11,130Updated last week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,656Updated 4 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,293Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆40,227Updated this week
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,589Updated last year
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation☆8,504Updated 9 months ago
- AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents☆16,815Updated this week
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆22,102Updated last week
- Production-ready platform for agentic workflow development.☆103,804Updated this week
- No fortress, purely open ground. OpenManus is Coming.☆47,108Updated last week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆25,911Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆25,975Updated this week
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆6,312Updated this week