ZubinGou / math-evaluation-harnessLinks
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจ
โ258Updated last year
Alternatives and similar repositories for math-evaluation-harness
Users that are interested in math-evaluation-harness are comparing it to the libraries listed below
Sorting:
- โ211Updated 8 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ143Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"โ174Updated 5 months ago
- [NeurIPS'24] Official code for *๐ฏDART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*โ115Updated 10 months ago
- โ323Updated 4 months ago
- โ210Updated 6 months ago
- โ67Updated 6 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning