sarahmart / HARDMathLinks
A new dataset of difficult graduate-level applied mathematics problems; evaluations demonstrate that leading LLMs currently exhibit low accuracy in solving these problems.
β26Updated 11 months ago
Alternatives and similar repositories for HARDMath
Users that are interested in HARDMath are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β120Updated last year
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Procesβ¦β84Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATHβ26Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"β124Updated 10 months ago
- β85Updated last year
- β78Updated last year
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witβ¦β150Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ124Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Modelsβ71Updated 11 months ago
- GenRM-CoT: Data release for verification rationalesβ67Updated last year
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Modelsβ138Updated last month
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracyβ77Updated 3 months ago
- The official repository of the Omni-MATH benchmark.β93Updated last year
- [NeurIPS'24 Spotlight] Observational Scaling Lawsβ58Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAIβ114Updated this week
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)β37Updated last year
- β224Updated 10 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"β124Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningβ120Updated 9 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".β64Updated 5 months ago
- β80Updated 10 months ago
- Resources for the Enigmata Project.β76Updated 5 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".β116Updated 6 months ago
- Repo of paper "Free Process Rewards without Process Labels"β168Updated 10 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionβ87Updated 10 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Updated last year
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineeringβ63Updated last year
- [ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scieβ¦β180Updated 7 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ182Updated 6 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMsβ82Updated 2 years ago