TRI-ML / vlm-evaluationLinks
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆135Updated last year
Alternatives and similar repositories for vlm-evaluation
Users that are interested in vlm-evaluation are comparing it to the libraries listed below
Sorting:
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Updated 2 years ago
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆291Updated 2 years ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆220Updated 3 months ago
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆153Updated 2 years ago
- Matryoshka Multimodal Models☆122Updated last year
- A RLHF Infrastructure for Vision-Language Models☆196Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆91Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Updated 4 months ago
- ☆155Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆325Updated 3 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆159Updated 4 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆72Updated last year
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆149Updated 4 months ago
- ☆80Updated last year
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.☆139Updated 2 years ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆89Updated 2 years ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆145Updated last year
- ☆360Updated 2 years ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆245Updated 5 months ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Updated 11 months ago
- [ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆116Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆70Updated last year
- M-HalDetect Dataset Release☆27Updated 2 years ago
- ☆111Updated last year
- Densely Captioned Images (DCI) dataset repository.☆196Updated last year
- ☆71Updated last year
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆203Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆321Updated last year
- ☆101Updated 2 years ago
- ☆79Updated last year