OpenBMB / MiniCPM-oLinks
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
☆19,821Updated 2 weeks ago
Alternatives and similar repositories for MiniCPM-o
Users that are interested in MiniCPM-o are comparing it to the libraries listed below
Sorting:
- MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips☆8,084Updated last week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆8,571Updated this week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆9,985Updated last month
- Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆11,551Updated 2 months ago
- This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.☆11,998Updated this week
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆3,913Updated last year
- Open-Sora: Democratizing Efficient Video Production for All☆26,885Updated 2 months ago
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,923Updated 9 months ago
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆9,112Updated this week
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆18,704Updated last month
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆14,574Updated this week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,447Updated 5 months ago
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆22,507Updated 3 weeks ago
- No fortress, purely open ground. OpenManus is Coming.☆48,014Updated this week
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-…☆8,716Updated this week
- Mobile-Agent: The Powerful Mobile Device Operation Assistant Family☆4,451Updated 2 weeks ago
- GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型☆6,693Updated 2 weeks ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,377Updated 4 months ago
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆54,373Updated this week
- "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆18,373Updated this week
- A series of large language models trained from scratch by developers @01-ai☆7,832Updated 7 months ago
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆23,029Updated 11 months ago
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.☆42,005Updated this week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,309Updated last month
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,611Updated last year
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆6,533Updated last week
- SGLang is a fast serving framework for large language models and vision language models.☆15,932Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆52,204Updated this week
- RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.☆60,141Updated this week
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,296Updated last year