ant-research / DreamLIPLinks

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

☆134

Alternatives and similar repositories for DreamLIP

Users that are interested in DreamLIP are comparing it to the libraries listed below

Sorting:

wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆100Updated 2 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆43Updated 6 months ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆45Updated 4 months ago
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆47Updated last month
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆171Updated 9 months ago
deepglint / ALIP
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆97Updated last year
meetdavidwan / crg
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆35Updated last year
alibaba / conv-llava
☆118Updated last year
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆84Updated last month
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆89Updated last month
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆53Updated 8 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆75Updated 3 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆165Updated 10 months ago
callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆78Updated 9 months ago
Liuziyu77 / RAR
The official implementation of RAR
☆89Updated last year
lizhou-cs / mglmm
☆31Updated 10 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆185Updated 2 weeks ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated 9 months ago
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆150Updated 7 months ago
geekyutao / TaskRes
Task Residual for Tuning Vision-Language Models (CVPR 2023)
☆73Updated 2 years ago
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆161Updated 9 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆91Updated 6 months ago
wusize / CLIPSelf
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
☆190Updated last year
mbzuai-oryx / CVRR-Evaluation-Suite
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆49Updated 11 months ago
mc-lan / ClearCLIP
[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
☆86Updated 4 months ago
Qinying-Liu / TagAlign
Official implementation of TagAlign
☆35Updated 7 months ago
chuangchuangtan / LLaVA-NeXT-Image-Llama3-Lora
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
☆44Updated last year
Yanqing0327 / MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
☆31Updated last year
XMUDeepLIT / LLaVE
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆63Updated 2 months ago
callsys / GenPromp
[ICCV 2023] Generative Prompt Model for Weakly Supervised Object Localization
☆57Updated last year