OpenBMB / MiniCPM-VLinks
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
☆22,460Updated 3 months ago
Alternatives and similar repositories for MiniCPM-V
Users that are interested in MiniCPM-V are comparing it to the libraries listed below
Sorting:
- MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks☆8,479Updated 2 months ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,643Updated 3 months ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆12,764Updated 3 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,041Updated 10 months ago
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆17,425Updated last month
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆20,037Updated last month
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,220Updated last year
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆4,032Updated last year
- Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.☆25,863Updated 2 months ago
- Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-p…☆8,883Updated this week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,859Updated 6 months ago
- Open-Sora: Democratizing Efficient Video Production for All☆28,151Updated 7 months ago
- This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.☆12,097Updated 2 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆14,176Updated this week
- ☆21Updated last year
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,502Updated last year
- The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.☆6,446Updated last year
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.☆49,952Updated this week
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,426Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆66,313Updated this week
- Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding☆4,287Updated last month
- [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆26,730Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,437Updated this week
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆10,703Updated this week
- Integrate the DeepSeek API into popular softwares☆34,860Updated 3 months ago
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,647Updated 10 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,203Updated last month
- Open-source framework for conversational voice AI agents☆9,406Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,028Updated 2 months ago
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,128Updated last month