[ACL'2024 Findings] GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
☆78Mar 13, 2024Updated 2 years ago
Alternatives and similar repositories for GAOKAO-MM
Users that are interested in GAOKAO-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GAOGAO-Bench-Updates is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.☆39Jan 7, 2025Updated last year
- GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.☆728Jan 7, 2025Updated last year
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆54Oct 20, 2024Updated last year
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated 11 months ago
- ☆19Nov 6, 2023Updated 2 years ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Apr 3, 2024Updated last year
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆13Sep 19, 2024Updated last year
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- ☆31Sep 15, 2025Updated 6 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 5 months ago
- ☆115Oct 7, 2025Updated 5 months ago
- GPT-4o-level, real-time spoken dialogue system.☆370Jan 27, 2025Updated last year
- ☆38Oct 2, 2024Updated last year
- Gaokao Benchmark for AI☆108Jul 8, 2022Updated 3 years ago
- [EMNLP 2022] Differentiable Data Augmentation for Contrastive Sentence Representation Learning. https://arxiv.org/abs/2210.16536☆40Nov 1, 2022Updated 3 years ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- ☆337May 24, 2025Updated 9 months ago
- Some example codes for drawing figures in research paper☆35Mar 3, 2022Updated 4 years ago
- ☆18Mar 20, 2022Updated 4 years ago
- FamilyTool benchmark☆13Sep 10, 2025Updated 6 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆27Dec 23, 2024Updated last year
- ☆67Aug 14, 2025Updated 7 months ago
- [AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning☆15Apr 29, 2024Updated last year
- 集拍照切题、搜题、学情分析、归纳整理为一体的线上教育平台☆40Feb 28, 2020Updated 6 years ago
- The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search☆62Jul 4, 2025Updated 8 months ago
- ☆18Jul 24, 2025Updated 7 months ago
- The code for “PromptNER: A Prompting Method for Few-shot Named Entity Recognition via k Nearest Neighbor Search”☆19Mar 13, 2024Updated 2 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆52Feb 27, 2025Updated last year
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- [ACL 2025] RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation☆117Jan 23, 2025Updated last year
- 武大信图抢座程序 支持后台持续监测,抢靠窗、有电脑的座位 以及抢座成功后自动关机☆15Dec 8, 2022Updated 3 years ago
- [ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model☆18Feb 24, 2025Updated last year
- ☆30Dec 27, 2024Updated last year
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆355Sep 29, 2025Updated 5 months ago
- ☆15Feb 28, 2023Updated 3 years ago
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches☆37Oct 9, 2025Updated 5 months ago
- ☆20Jul 5, 2024Updated last year
- ☆58Feb 27, 2025Updated last year