MMMU-Benchmark / mmmu-benchmark.github.ioLinks

☆15

Alternatives and similar repositories for mmmu-benchmark.github.io

Users that are interested in mmmu-benchmark.github.io are comparing it to the libraries listed below

Sorting:

ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆124Updated 10 months ago
vlf-silkie / VLFeedback
☆101Updated 2 years ago
kiaia / GIRAFFE
Extending context length of visual language models
☆12Updated last year
ShadeCloak / ADORA
☆47Updated 9 months ago
ruixin31 / Spurious_Rewards
☆352Updated 6 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆168Updated 10 months ago
declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆113Updated 11 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆70Updated 6 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆50Updated last year
RenShuhuai-Andy / my-tools
my commonly-used tools
☆64Updated last year
sail-sg / ActivePRM
☆20Updated 9 months ago
DAMO-NLP-SG / Auto-Arena-LLMs
☆43Updated last year
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆70Updated 10 months ago
hkust-nlp / Laser
[ICLR2026] Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
☆62Updated 8 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆76Updated 8 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆96Updated 8 months ago
Essential-AI / reflection
☆48Updated 9 months ago
HKUNLP / diffusion-of-thoughts
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
☆201Updated 11 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆160Updated 3 months ago
zzzhr97 / SpecBench
☆23Updated 3 months ago
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆45Updated 6 months ago
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆104Updated 4 months ago
MileBench / MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
☆36Updated last year
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆419Updated 6 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆64Updated last year
VTool-R1 / VTool-R1
Code for the paper "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use" [ICLR 2026]
☆152Updated last week
mat-agent / MAT-Agent
MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)
☆84Updated last month
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆92Updated 2 months ago
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆88Updated 11 months ago
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆195Updated last year