Raphoo / DCSM_Ideal_CLIPLinks

Code for "Is CLIP ideal? No. Can we fix it? Yes!"

☆39

Alternatives and similar repositories for DCSM_Ideal_CLIP

Users that are interested in DCSM_Ideal_CLIP are comparing it to the libraries listed below

Sorting:

callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆79Updated last year
SivanDoveh / IPLoc
Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
☆38Updated 11 months ago
altndrr / lmms-owc
Code implementation of our ICCV 2025 paper: On Large Multimodal Models as Open-World Image Classifiers
☆24Updated this week
McGill-NLP / diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆33Updated last year
NMS05 / Patch-Aligned-Contrastive-Learning
☆23Updated 2 years ago
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆45Updated 11 months ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated last year
renwang435 / video-ttt-release
Test-Time Training on Video Streams
☆64Updated 2 years ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 3 weeks ago
xuanlinli17 / large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
☆59Updated last year
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
☆54Updated 7 months ago
shubhamprshr27 / NeglectedTailsVLM
This repository houses the code for the paper - "The Neglected of VLMs"
☆29Updated 6 months ago
shashankvkt / DoRA_ICLR24
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long …
☆93Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆105Updated 5 months ago
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆53Updated 7 months ago
QUVA-Lab / PIN
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Updated 10 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 3 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago
dahyun-kang / lavg
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆47Updated last year
jh-yi / Video-Panda
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]
☆75Updated 4 months ago
yixuan730 / DetToolChain
Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM
☆43Updated last year
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆132Updated 7 months ago
OpenGVLab / EgoExoLearn
[CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
☆70Updated 2 months ago
ytaek-oh / awesome-vl-compositionality
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
☆30Updated 9 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆84Updated last year
dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆31Updated last year
franciszzj / OpenPSG
[ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
☆49Updated 10 months ago
jiayuww / SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
☆55Updated 9 months ago
hwjiang1510 / VQLoC
(NeurIPS 2023) Open-set visual object query search & localization in long-form videos
☆25Updated last year