baaivision / DenseFusionLinks

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

☆156

Alternatives and similar repositories for DenseFusion

Users that are interested in DenseFusion are comparing it to the libraries listed below

Sorting:

x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆217Updated 7 months ago
alibaba / conv-llava
☆119Updated last year
ggjy / DeLVM
☆119Updated last year
AILab-CVC / VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆353Updated 3 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆168Updated 2 weeks ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆121Updated 6 months ago
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆211Updated last year
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆210Updated last year
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆166Updated last year
PKU-YuanGroup / Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
☆133Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆177Updated last year
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆87Updated 3 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆88Updated 4 months ago
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 8 months ago
lxtGH / DenseWorld-1M
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
☆111Updated 3 weeks ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 5 months ago
xichenpan / Kosmos-G
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
☆73Updated last year
rongyaofang / PUMA
Empowering Unified MLLM with Multi-granular Visual Generation
☆130Updated 9 months ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆148Updated 11 months ago
jiyt17 / IDA-VLM
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆36Updated 10 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆191Updated 4 months ago
mutonix / Vript
☆155Updated 9 months ago
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆191Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
V3Det / V3Det
☆112Updated last year
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆129Updated 4 months ago
foundation-multimodal-models / CAPTURE
☆76Updated last year