hiyouga / MathRulerLinks
A light-weight tool for evaluating LLMs in rule-based ways.
☆54Updated last week
Alternatives and similar repositories for MathRuler
Users that are interested in MathRuler are comparing it to the libraries listed below
Sorting:
- ☆45Updated last month
- The official repository of the Omni-MATH benchmark.☆83Updated 5 months ago
- ☆36Updated last month
- ☆60Updated 2 weeks ago
- A version of verl to support tool use☆172Updated this week
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆106Updated last month
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆97Updated this week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆103Updated last week
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆106Updated 5 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆61Updated 5 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆61Updated 5 months ago
- ☆231Updated last week
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated this week
- The code and data for the paper JiuZhang3.0☆45Updated last year
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆103Updated this week
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆37Updated 3 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆104Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆123Updated 2 months ago
- The official code of paper “Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning”☆117Updated this week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆95Updated 2 months ago
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆94Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆180Updated 2 months ago
- ☆63Updated 6 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆70Updated 6 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆60Updated 5 months ago