CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆163Jun 8, 2024Updated last year
Alternatives and similar repositories for CuMo
Users that are interested in CuMo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 【NeurIPS 2024】Dense Connector for MLLMs☆183Oct 14, 2024Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆84Dec 27, 2025Updated 4 months ago
- Local self-attention in Transformer for visual question answering☆13Mar 17, 2024Updated 2 years ago
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,314Jul 15, 2025Updated 9 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- When do we not need larger vision models?☆418Feb 8, 2025Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆174Sep 25, 2024Updated last year
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- A RLHF Infrastructure for Vision-Language Models☆199Nov 15, 2024Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆46Jul 1, 2025Updated 10 months ago
- ☆4,645Apr 15, 2026Updated 2 weeks ago
- ☆112Jan 8, 2025Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆76Oct 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆214Feb 27, 2024Updated 2 years ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆282Jun 25, 2024Updated last year
- ☆14Apr 25, 2025Updated last year
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆308Sep 11, 2024Updated last year
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆72Oct 17, 2025Updated 6 months ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆117May 30, 2024Updated last year
- ☆127Jul 29, 2024Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆110May 29, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,996Nov 7, 2025Updated 5 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- ☆102Dec 22, 2023Updated 2 years ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- ☆19May 31, 2023Updated 2 years ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆407Aug 24, 2024Updated last year
- ☆222Jul 5, 2024Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆64Nov 5, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆338Jul 17, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆278May 26, 2025Updated 11 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆124Jul 1, 2024Updated last year
- [ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆153Dec 5, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆443Dec 22, 2024Updated last year
- Efficient Multimodal Large Language Models: A Survey☆386Apr 29, 2025Updated last year
- 🚀 [NeurIPS24] Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'2…☆90Jun 24, 2025Updated 10 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆452May 14, 2025Updated 11 months ago