yuhui-zh15/VLMClassifier

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuhui-zh15/VLMClassifier)

yuhui-zh15 / VLMClassifier

Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)

☆98

Alternatives and similar repositories for VLMClassifier

Users that are interested in VLMClassifier are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kaist-ami / BEAF
View on GitHub
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆22Mar 26, 2025Updated last year
yuhui-zh15 / C3
View on GitHub
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆36Oct 16, 2024Updated last year
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
orrzohar / Video-STaR
View on GitHub
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆72Jul 10, 2024Updated 2 years ago
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆338Jul 17, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Understanding-Visual-Datasets / VisDiff
View on GitHub
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆134Nov 5, 2025Updated 8 months ago
orrzohar / LOVM
View on GitHub
[NeurIPS 2023] Official Pytorch code for LOVM: Language-Only Vision Model Selection
☆21Feb 3, 2024Updated 2 years ago
ys-zong / VL-ICL
View on GitHub
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆69Sep 20, 2025Updated 10 months ago
ForJadeForest / Lever-LM
View on GitHub
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
☆18Oct 4, 2024Updated last year
paulgavrikov / vlm_shapebias
View on GitHub
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
☆31Jan 26, 2025Updated last year
wjpoom / SPEC
View on GitHub
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆52Jun 16, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
lezhang7 / Enhance-FineGrained
View on GitHub
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆56Apr 7, 2025Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
d-ailin / CLIP-Guided-Decoding
View on GitHub
☆18Aug 1, 2024Updated last year
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
ExplainableML / WaffleCLIP
View on GitHub
Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts…
☆61Jul 8, 2023Updated 3 years ago
CVMI-Lab / clip-beyond-tail
View on GitHub
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
☆27Oct 28, 2024Updated last year
YiyangZhou / LURE
View on GitHub
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆158Apr 30, 2024Updated 2 years ago
YiyangZhou / CSR
View on GitHub
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆87Oct 26, 2025Updated 8 months ago
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
bronyayang / HallE_Control
View on GitHub
HallE-Control: Controlling Object Hallucination in LMMs
☆32Apr 10, 2024Updated 2 years ago
yeung-lab / Micro-Bench
View on GitHub
A Vision-Language Benchmark for Microscopy Understanding
☆31Mar 13, 2025Updated last year
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆210Jul 17, 2025Updated last year
zzxslp / SoM-LLaVA
View on GitHub
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆145Aug 23, 2024Updated last year
FactoDeepLearning / MultitaskVLFM
View on GitHub
☆25Aug 1, 2023Updated 2 years ago
Kamichanw / ICLTestbed
View on GitHub
An in-context learning research testbed
☆19Mar 16, 2025Updated last year
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
yfzhang114 / LLaVA-Align
View on GitHub
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆83Feb 22, 2025Updated last year
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
zhengli97 / PromptKD
View on GitHub
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
☆346Dec 14, 2025Updated 7 months ago
YBZh / DMN
View on GitHub
CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
☆96Jul 4, 2024Updated 2 years ago
jmhb0 / viddiff
View on GitHub
[ICLR 2025] Video Action Differencing
☆53Jul 3, 2025Updated last year
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
zeyofu / BLINK_Benchmark
View on GitHub
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆171Sep 27, 2025Updated 9 months ago
zycheiheihei / Transferable-Visual-Prompting
View on GitHub
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆45Dec 20, 2024Updated last year
konpanousis / ConceptDiscoveryModels
View on GitHub
This is the official implementation of the Concept Discovery Models paper.
☆15Aug 27, 2023Updated 2 years ago