OpenMOSS / GAOKAO-MMLinks
[ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
☆69Updated last year
Alternatives and similar repositories for GAOKAO-MM
Users that are interested in GAOKAO-MM are comparing it to the libraries listed below
Sorting:
- Paper collections of multi-modal LLM for Math/STEM/Code.☆129Updated last week
- Extrapolating RLVR to General Domains without Verifiers☆176Updated 2 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆205Updated last month
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆60Updated 7 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆136Updated 6 months ago
- ☆26Updated last year
- ☆58Updated last year
- [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆124Updated 4 months ago
- Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning☆103Updated 2 weeks ago
- ☆84Updated last year
- The official repository of the Omni-MATH benchmark.☆88Updated 10 months ago
- (ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆87Updated 9 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆37Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆101Updated last month
- ☆169Updated 6 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆91Updated last year
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆158Updated last month
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆340Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoning☆84Updated 9 months ago
- ☆74Updated 9 months ago
- Test-time preferenece optimization (ICML 2025).☆168Updated 5 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆83Updated 8 months ago
- An Arena-style Automated Evaluation Benchmark for Detailed Captioning☆56Updated 5 months ago
- The demo, code and data of FollowRAG☆75Updated 4 months ago
- WritingBench: A Comprehensive Benchmark for Generative Writing☆125Updated last month
- ☆39Updated 3 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆43Updated last year
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆294Updated last year
- Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓☆34Updated 7 months ago
- a-m-team's exploration in large language modeling☆190Updated 5 months ago