pkunlp-icler / PCA-EVALLinks

[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

☆104

Alternatives and similar repositories for PCA-EVAL

Users that are interested in PCA-EVAL are comparing it to the libraries listed below

Sorting:

patrick-tssn / Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…
☆123Updated 4 months ago
mlfoundations / VisIT-Bench
☆50Updated last year
thunlp / Muffin
☆65Updated last year
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆115Updated 10 months ago
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆86Updated 8 months ago
vlf-silkie / VLFeedback
☆100Updated last year
RifleZhang / LLaVA-Reasoner-DPO
☆91Updated 8 months ago
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆184Updated 10 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆57Updated 9 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆84Updated 4 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆90Updated last year
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109Updated 4 months ago
szxiangjn / world-model-for-language-model
☆131Updated last year
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
☆77Updated 3 months ago
Xiuyuan-Chen / AutoEval-Video
☆36Updated last year
pipilurj / G-LLaVA
Official github repo of G-LLaVA
☆147Updated 7 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated 11 months ago
mathllm / MATH-V
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆116Updated 4 months ago
RifleZhang / LLaVA-Hound-DPO
☆153Updated 11 months ago
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆45Updated last year
CMMMU-Benchmark / CMMMU
☆48Updated last year
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 4 months ago
princeton-nlp / CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
☆126Updated 5 months ago
dvlab-research / Mr-Ben
This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
☆50Updated 11 months ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆83Updated 8 months ago
AGI-Edgerunners / IIL
Code for our Paper "All in an Aggregated Image for In-Image Learning"
☆29Updated last year
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆93Updated last year
ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated 10 months ago
RUCBM / GUICourse
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆129Updated last year