TRI-ML/vlm-evaluation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TRI-ML/vlm-evaluation)

TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

☆139

Alternatives and similar repositories for vlm-evaluation

Users that are interested in vlm-evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TRI-ML / prismatic-vlms
View on GitHub
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆1,004Jul 4, 2024Updated 2 years ago
zhjohnchan / bert-clip-synesthesia
View on GitHub
[Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.
☆14Jun 7, 2023Updated 3 years ago
peterant330 / KUEA
View on GitHub
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆23Sep 7, 2025Updated 10 months ago
MLAI-Yonsei / CaRot
View on GitHub
source code for NeurIPS'24 paper "Towards Calibrated Robust Fine-Tuning of Vision-Language Models"
☆15Oct 31, 2025Updated 8 months ago
WildVision-AI / WildVision-Bench
View on GitHub
☆17Oct 21, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kaist-ami / BEAF
View on GitHub
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆22Mar 26, 2025Updated last year
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
ajd12342 / why-winoground-hard
View on GitHub
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31May 29, 2023Updated 3 years ago
ExplainableML / fomo_in_flux
View on GitHub
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆62Dec 10, 2024Updated last year
FudanNLPLAB / MouSi
View on GitHub
☆75Mar 7, 2024Updated 2 years ago
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
Victorwz / VaLM
View on GitHub
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Mar 6, 2023Updated 3 years ago
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
VincentDENGP / 3D-LR
View on GitHub
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Mar 28, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
oripress / CCC
View on GitHub
Code for Continuously Changing Corruptions (CCC) benchmark + evaluation
☆43Aug 21, 2024Updated last year
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,338Updated this week
RAIVNLab / sugar-crepe
View on GitHub
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆95Feb 13, 2024Updated 2 years ago
Wang-ML-Lab / multimodal-needle-in-a-haystack
View on GitHub
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆55Apr 22, 2026Updated 3 months ago
shizhediao / DaVinci
View on GitHub
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆45Apr 30, 2023Updated 3 years ago
UCSC-VLAA / vllm-safety-benchmark
View on GitHub
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆90Nov 28, 2023Updated 2 years ago
mlfoundations / VisIT-Bench
View on GitHub
☆51Oct 29, 2023Updated 2 years ago
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OFA-Sys / TouchStone
View on GitHub
Touchstone: Evaluating Vision-Language Models by Language Models
☆84Jan 18, 2024Updated 2 years ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆366Jan 14, 2025Updated last year
zeyofu / Commonsense-T2I
View on GitHub
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆24Aug 13, 2024Updated last year
yuweihao / MM-Vet
View on GitHub
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆331Jan 20, 2025Updated last year
sgvaze / SSB
View on GitHub
Python package to download and use the SSB datasets
☆11Aug 3, 2023Updated 2 years ago
tripletclip / TripletCLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆49Dec 1, 2024Updated last year
umd-huang-lab / Mementos
View on GitHub
☆32Feb 8, 2024Updated 2 years ago
jiyounglee-0523 / VisAlign
View on GitHub
☆20Apr 23, 2024Updated 2 years ago
google-deepmind / svo_probes
View on GitHub
The SVO-Probes Dataset for Verb Understanding
☆29Jan 28, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
holarissun / embedding-based-llm-alignment
View on GitHub
Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
☆22Apr 24, 2025Updated last year
ExplainableML / Vision_by_Language
View on GitHub
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
☆89Jul 4, 2024Updated 2 years ago
linzhiqiu / CLIP-FlanT5
View on GitHub
Training code for CLIP-FlanT5
☆31Jul 29, 2024Updated 2 years ago
boyazeng / understand_bias
View on GitHub
Code release for "Understanding Bias in Large-Scale Visual Datasets"
☆25Dec 4, 2024Updated last year
YiyangZhou / CSR
View on GitHub
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆87Oct 26, 2025Updated 9 months ago
BatsResearch / ex2
View on GitHub
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Apr 4, 2024Updated 2 years ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,309Jul 22, 2026Updated last week