CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆163Jun 8, 2024Updated last year
Alternatives and similar repositories for CuMo
Users that are interested in CuMo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 【NeurIPS 2024】Dense Connector for MLLMs☆182Oct 14, 2024Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆81Dec 27, 2025Updated 2 months ago
- Local self-attention in Transformer for visual question answering☆13Mar 17, 2024Updated 2 years ago
- 【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models☆2,307Jul 15, 2025Updated 8 months ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32May 15, 2023Updated 2 years ago
- When do we not need larger vision models?☆415Feb 8, 2025Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆174Sep 25, 2024Updated last year
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- A RLHF Infrastructure for Vision-Language Models☆197Nov 15, 2024Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆46Jul 1, 2025Updated 8 months ago
- ☆4,607Sep 14, 2025Updated 6 months ago
- ☆112Jan 8, 2025Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆73Oct 16, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆214Feb 27, 2024Updated 2 years ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆67Feb 19, 2025Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆306Sep 11, 2024Updated last year
- ☆14Apr 25, 2025Updated 11 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆71Oct 17, 2025Updated 5 months ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆116May 30, 2024Updated last year
- ☆125Jul 29, 2024Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆109May 29, 2025Updated 9 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,995Nov 7, 2025Updated 4 months ago
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆22Sep 26, 2024Updated last year
- ☆102Dec 22, 2023Updated 2 years ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆64Nov 5, 2024Updated last year
- ☆18May 31, 2023Updated 2 years ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆401Aug 24, 2024Updated last year
- ☆222Jul 5, 2024Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Jul 17, 2024Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆278May 26, 2025Updated 9 months ago
- [NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models☆123Jul 1, 2024Updated last year
- Using image captions with LLM for zero-shot VQA☆18Mar 14, 2024Updated 2 years ago
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆35Jul 2, 2024Updated last year
- [ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆153Dec 5, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆439Dec 22, 2024Updated last year
- Efficient Multimodal Large Language Models: A Survey☆388Apr 29, 2025Updated 10 months ago