facebookresearch / DCILinks

Densely Captioned Images (DCI) dataset repository.

☆187

Alternatives and similar repositories for DCI

Users that are interested in DCI are comparing it to the libraries listed below

Sorting:

baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆211Updated last year
LijieFan / LaCLIP
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆283Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
LAION-AI / scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
☆171Updated last month
UCSC-VLAA / CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆316Updated last year
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆150Updated 8 months ago
tsb0601 / MMVP
☆344Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 6 months ago
Yushi-Hu / tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
☆170Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
Official implementation of the Law of Vision Representation in MLLMs
☆163Updated 8 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆134Updated 2 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆133Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆189Updated 10 months ago
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆163Updated last year
UCSC-VLAA / Recap-DataComp-1B
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆138Updated last year
foundation-multimodal-models / CAPTURE
☆66Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆327Updated last year
imagegridworth / IG-VLM
☆138Updated 10 months ago
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆342Updated 2 weeks ago
dvlab-research / Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
☆150Updated last year
AILab-CVC / SEED-Bench
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆345Updated 6 months ago
amazon-science / prompt-pretraining
Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"
☆258Updated last year
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆152Updated last year
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆59Updated 2 years ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆144Updated 8 months ago
google-research / syn-rep-learn
Learning from synthetic data - code and models
☆319Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
mertyg / vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆280Updated 2 years ago
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆211Updated 4 months ago
xichenpan / Kosmos-G
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
☆73Updated last year