TRI-ML / vlm-evaluationLinks
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆135Updated last year
Alternatives and similar repositories for vlm-evaluation
Users that are interested in vlm-evaluation are comparing it to the libraries listed below
Sorting:
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Updated 3 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Updated 2 years ago
- A RLHF Infrastructure for Vision-Language Models☆193Updated last year
- Matryoshka Multimodal Models☆121Updated last year
- ☆155Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆90Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆322Updated 3 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆220Updated 3 months ago
- ☆80Updated last year
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆153Updated 2 years ago
- [TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆147Updated 3 months ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆145Updated last year
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆291Updated 2 years ago
- ☆79Updated last year
- ☆102Updated 2 years ago
- ☆359Updated 2 years ago
- ☆110Updated last year
- [ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆116Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆280Updated last year
- M-HalDetect Dataset Release☆26Updated 2 years ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆159Updated 4 months ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆89Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆320Updated last year
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆203Updated last year
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]☆237Updated 3 weeks ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆104Updated 5 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆244Updated 5 months ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Updated 11 months ago
- ☆50Updated 2 years ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆72Updated last year