OpenBMB / MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
☆12,642Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for MiniCPM-V
- MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.☆7,135Updated 2 weeks ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆6,055Updated this week
- BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI…☆8,905Updated this week
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,211Updated 6 months ago
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆5,185Updated 2 weeks ago
- Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation☆9,496Updated 2 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆6,053Updated this week
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,122Updated 2 months ago
- Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.☆9,783Updated this week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆19,247Updated this week
- This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.☆11,563Updated this week
- "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆8,824Updated this week
- A UI-Focused Agent for Windows OS Interaction.☆7,921Updated 2 weeks ago
- Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.☆15,471Updated this week
- The open source platform for AI-native application development.☆6,216Updated this week
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆3,153Updated last month
- Large World Model -- Modeling Text and Video with Millions Context☆7,153Updated last month
- The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.☆14,195Updated last week
- An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)☆3,977Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,092Updated 2 weeks ago
- Brand new TTS solution☆14,572Updated last week
- Mobile-Agent: The Powerful Mobile Device Operation Assistant Family☆3,000Updated last month
- ☆6,781Updated 3 weeks ago
- The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.☆5,055Updated 3 months ago
- RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.☆23,277Updated this week
- Your image is almost there!☆7,334Updated 3 months ago
- Question and Answer based on Anything.☆11,890Updated this week
- Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory☆18,263Updated this week
- Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding☆3,456Updated last month
- Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.☆3,505Updated last month