leolee99 / CLIP_ITMLinks

A simple pytorch implementation of baseline based-on CLIP for Image-text Matching.

☆15

Alternatives and similar repositories for CLIP_ITM

Users that are interested in CLIP_ITM are comparing it to the libraries listed below

Sorting:

tmlr-group / WCA
[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"
☆57Updated last year
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆49Updated last year
taewhankim / VIPCAP
☆14Updated 11 months ago
aimagelab / HySAC
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆25Updated 7 months ago
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆65Updated 2 months ago
Pter61 / context-i2w
Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]
☆55Updated 6 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
leolee99 / PAU
The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…
☆27Updated last year
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆52Updated last year
boreng0817 / IFCap
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15Updated 6 months ago
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆91Updated last year
pkunlp-icler / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆50Updated 4 months ago
ytaek-oh / fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆20Updated last year
zycheiheihei / Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆46Updated 11 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
adobe-research / llava-score
☆11Updated last year
G-JWLee / COINCIDE_code
☆21Updated last year
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆60Updated last year
sangminwoo / AvisC
[ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…
☆21Updated last year
bronyayang / HallE_Control
HallE-Control: Controlling Object Hallucination in LMMs
☆31Updated last year
aimagelab / PMA-Net
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Updated last year
xing0047 / cca-llava
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
☆63Updated 3 months ago
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆53Updated 7 months ago
mrwu-mac / R-Bench
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆22Updated 10 months ago
joeyz0z / MeaCap
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
☆53Updated last year
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆55Updated last year
Koorye / SkipTuning
[CVPR 2025] Offical implementation of the paper "Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters The…
☆29Updated 9 months ago
Ziwei-Zheng / Nullu
Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
☆46Updated 8 months ago
haoyiq114 / VALOR
Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)
☆16Updated last year