sarahmart / HARDMathLinks
A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low accuracy in solving these problems.
β25Updated 10 months ago
Alternatives and similar repositories for HARDMath
Users that are interested in HARDMath are comparing it to the libraries listed below
Sorting:
- GenRM-CoT: Data release for verification rationalesβ67Updated last year
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β119Updated last year
- β77Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"β124Updated 9 months ago
- β85Updated 11 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ136Updated last week
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witβ¦β148Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ125Updated last year
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ76Updated 2 months ago
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scieβ¦β176Updated 6 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ70Updated 10 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"β31Updated last year
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Procesβ¦β79Updated 11 months ago
- Resources for the Enigmata Project.β74Updated 4 months ago
- The rule-based evaluation subset and code implementation of Omni-MATHβ26Updated last year
- The official repository of the Omni-MATH benchmark.β90Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAIβ114Updated 2 months ago
- A repo for open research on building large reasoning modelsβ125Updated this week
- β79Updated 9 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β146Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".β114Updated 4 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)β54Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningβ118Updated 7 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"β183Updated 7 months ago
- β218Updated 9 months ago
- β70Updated last year
- β25Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"β181Updated 7 months ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)β22Updated 2 months ago
- RL Scaling and Test-Time Scaling (ICML'25)β112Updated 11 months ago