kdariina/CLIP-not-BoW-unimodally

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kdariina/CLIP-not-BoW-unimodally)

kdariina / CLIP-not-BoW-unimodally

Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"

☆29

Alternatives and similar repositories for CLIP-not-BoW-unimodally

Users that are interested in CLIP-not-BoW-unimodally are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ethanlshen / HierNet
View on GitHub
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆23Nov 8, 2023Updated 2 years ago
mayu-ot / tti-human-eval
View on GitHub
☆12Oct 4, 2023Updated 2 years ago
dongjunhwang / ConOVS
View on GitHub
Official Implementation of "OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation" (NeurIPS 2025).
☆16Feb 27, 2026Updated 5 months ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
maseval / MASEval
View on GitHub
Multi-Agent LLM Evaluation Docs: https://maseval.readthedocs.io/
☆37Jul 5, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
arubique / OCCAM
View on GitHub
This is an implementation of the paper "Are We Done with Object-Centric Learning?"
☆14Jun 21, 2026Updated last month
tripletclip / TripletCLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆49Dec 1, 2024Updated last year
jylei16 / Imagine-e
View on GitHub
☆14Jan 22, 2025Updated last year
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
aktsonthalia / starlight
View on GitHub
Source code for the paper "Do Deep Neural Network Solutions form a Star Domain?"
☆12May 26, 2024Updated 2 years ago
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago
bmucsanyi / bud
View on GitHub
☆18Sep 3, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lscpku / VITATECS
View on GitHub
☆18Jul 10, 2024Updated 2 years ago
navervision / CompoDiff
View on GitHub
Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
☆88Feb 2, 2025Updated last year
Raphoo / linear-mech-vlms
View on GitHub
Code for "Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models"
☆15Feb 16, 2026Updated 5 months ago
amitakamath / whatsup_vlms
View on GitHub
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆71Feb 28, 2024Updated 2 years ago
naver-ai / lut
View on GitHub
[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
☆14Dec 1, 2024Updated last year
lezhang7 / Enhance-FineGrained
View on GitHub
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆56Apr 7, 2025Updated last year
naver-ai / imagenet-annotation-tool
View on GitHub
☆17Jul 24, 2023Updated 3 years ago
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 3 years ago
allenai / lerobot
View on GitHub
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
☆15Jun 2, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
BatsResearch / ex2
View on GitHub
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Apr 4, 2024Updated 2 years ago
oshapio / necessary-compositionality
View on GitHub
Official code for the paper "Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models"
☆23Mar 7, 2026Updated 4 months ago
MYMY-young / DelimScaling
View on GitHub
[ICLR 2026] Official implementation of "Enhancing Multi-Image Understanding Through Delimiter Token Scaling"
☆16Jul 10, 2026Updated 2 weeks ago
RaptorMai / MLLM-CompBench
View on GitHub
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relati…
☆46Apr 21, 2025Updated last year
MikeWangWZHL / Paxion
View on GitHub
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆38May 23, 2023Updated 3 years ago
UCSB-AI / ComCLIP
View on GitHub
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Aug 18, 2024Updated last year
mlfoundations / clip_quality_not_quantity
View on GitHub
☆28Oct 18, 2022Updated 3 years ago
TAU-VAILab / isbertblind
View on GitHub
This repository is for the paper "Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding…
☆21Nov 2, 2023Updated 2 years ago
salesforce / adversarial-polyglots
View on GitHub
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
☆10May 1, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
naver-ai / w-ood
View on GitHub
☆80Nov 28, 2022Updated 3 years ago
Raphoo / DCSM_Ideal_CLIP
View on GitHub
Code for "Is CLIP ideal? No. Can we fix it? Yes!"
☆56Dec 12, 2025Updated 7 months ago
parameterlab / leaky_thoughts
View on GitHub
Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025
☆17Jan 12, 2026Updated 6 months ago
alibaba-mmai-research / HiCo
View on GitHub
CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
☆18Aug 10, 2022Updated 3 years ago
noelshin / zutis
View on GitHub
[CVPRW'23 Best Paper Award] Zero-shot Unsupervised Transfer Instance Segmentation
☆24Aug 22, 2023Updated 2 years ago
shiqichen17 / AdaptVis
View on GitHub
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆76May 2, 2025Updated last year
princeton-pli / VLM_S2H
View on GitHub
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
☆19Jun 3, 2025Updated last year