DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
Alternatives and similar repositories for DenseFusion
Users that are interested in DenseFusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 8 months ago
- When do we not need larger vision models?☆415Feb 8, 2025Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆138May 8, 2025Updated 10 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆415May 5, 2025Updated 10 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆182Oct 14, 2024Updated last year
- ☆125Jul 29, 2024Updated last year
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆214Feb 27, 2024Updated 2 years ago
- [ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models☆153Dec 5, 2024Updated last year
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆149Nov 14, 2024Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆278May 26, 2025Updated 9 months ago
- Official repository for the paper PLLaVA☆676Jul 28, 2024Updated last year
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,995Nov 7, 2025Updated 4 months ago
- [ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant☆247Aug 14, 2024Updated last year
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).