jiaangli / VLCALinks

Do Vision and Language Models Share Concepts? A Vector Space Alignment Study

☆16

Alternatives and similar repositories for VLCA

Users that are interested in VLCA are comparing it to the libraries listed below

Sorting:

locuslab / llava-token-compression
☆44Updated last year
ytaek-oh / fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆19Updated last year
BatsResearch / ex2
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Updated last year
ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆21Updated 2 years ago
ExplainableML / sae-for-vlm
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆42Updated 7 months ago
Han-Zongbo / Skip-n
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
☆14Updated last year
wangf3014 / Patch_Scaling
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
☆23Updated 8 months ago
adobe-research / llava-score
☆11Updated last year
Optimization-AI / FastCLIP
Distributed Optimization Infra for learning CLIP models
☆27Updated last year
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆48Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆81Updated last month
kigb / DropoutDecoding
[NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"
☆21Updated 11 months ago
nickjiang2378 / vlm-hallucinations
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆92Updated 5 months ago
d-ailin / CLIP-Guided-Decoding
☆17Updated last year
LaVi-Lab / Visual-Table
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago
ExplainableML / fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆60Updated 11 months ago
techmonsterwang / iLLaMA
Adapting LLaMA Decoder to Vision Transformer
☆30Updated last year
HanSolo9682 / CounterCurate
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆18Updated last year
lscpku / VITATECS
☆18Updated last year
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25Updated 11 months ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 8 months ago
yale-nlp / TOMATO
☆34Updated last year
OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆50Updated last year
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated 3 weeks ago
kdariina / CLIP-not-BoW-unimodally
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆16Updated 9 months ago
princetonvisualai / icons
☆21Updated 6 months ago
BIT-DA / BorLan
[ICCV2023] Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm
☆18Updated 2 years ago
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated last year