zhaoshitian / Causal-CoGLinks

[CVPR'24 Highlight] Implementation of "Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models"

☆16

Alternatives and similar repositories for Causal-CoG

Users that are interested in Causal-CoG are comparing it to the libraries listed below

Sorting:

zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆151Updated last year
ThomasWangY / 2024-AAAI-HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
☆74Updated 8 months ago
Harvard-Ophthalmology-AI-Lab / FairCLIP
[CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning
☆90Updated 3 months ago
richard-peng-xia / HGCLIP
[COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
☆43Updated 10 months ago
minghu0830 / OphNet-benchmark
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
☆57Updated 3 months ago
lerogo / aaai24_itr_cusa
Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"
☆48Updated last year
megvii-research / CasPL
☆48Updated 8 months ago
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆89Updated last year
CHENGY12 / PLOT
[ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
☆170Updated last year
jinlHe / PeFoMed
The code for paper: PeFoM-Med: Parameter Efficient Fine-tuning on Multi-modal Large Language Models for Medical Visual Question Answering
☆56Updated 4 months ago
muzairkhattak / PromptSRC
[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without F…
☆276Updated 2 years ago
muzairkhattak / ProText
[AAAI'25, CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
☆113Updated 10 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆84Updated last year
Hodasia / Awesome-Vision-Language-Finetune
Awesome List of Vision Language Prompt Papers
☆48Updated last year
GaryGuTC / LaPA_model
[CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
☆22Updated 6 months ago
jameelhassan / PromptAlign
[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
☆107Updated last year
ZjjConan / VLM-MultiModalAdapter
The official pytorch implemention of our CVPR-2024 paper "MMA: Multi-Modal Adapter for Vision-Language Models".
☆83Updated 6 months ago
Cyang-Zhao / Grad-Eclip
☆57Updated 2 weeks ago
SooLab / DDCOT
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆46Updated last year
YingWANGG / M2IB
Code for the paper Visual Explanations of Image–Text Representations via Multi-Modal Information Bottleneck Attribution
☆60Updated last year
Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆57Updated last year
aiming-lab / MMedPO
[ICML'25] MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
☆58Updated 4 months ago
ivonajdenkoska / multimodal-meta-learn
[ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"
☆59Updated 2 years ago
BioMedIA-MBZUAI / MedPromptX
☆70Updated 3 months ago
sergiotasconmorales / consistency_vqa
Repository of paper Consistency-preserving Visual Question Answering in Medical Imaging (MICCAI2022)
☆23Updated 2 years ago
ExplainableML / cosmos
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆31Updated 7 months ago
mzhaoshuai / RLCF
[ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.
☆93Updated last year
lezhang7 / SAIL
[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"
☆50Updated 2 months ago
TIMMY-CHAN / MILE
[MICCAI 2024] Can LLMs' Tuning Methods Work in Medical Multimodal Domain?
☆17Updated last year
KishoreP1 / DetailCLIP
Detail-Oriented CLIP for Fine-Grained Tasks (ICLR SSI-FM 2025)
☆55Updated 7 months ago