hiyouga / MathRulerLinks
A light-weight tool for evaluating LLMs in rule-based ways.
☆65Updated 3 weeks ago
Alternatives and similar repositories for MathRuler
Users that are interested in MathRuler are comparing it to the libraries listed below
Sorting:
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆121Updated 3 months ago
- ☆318Updated last month
- ☆46Updated 3 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆251Updated last week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆187Updated 3 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆108Updated 2 months ago
- The official repository of the Omni-MATH benchmark.☆85Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆251Updated last month
- ☆205Updated 4 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆79Updated last month
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆52Updated last week
- A version of verl to support tool use☆292Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆105Updated last month
- ☆241Updated last month
- ☆64Updated last month
- Extrapolating RLVR to General Domains without Verifiers☆112Updated 3 weeks ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆56Updated this week
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆127Updated this week
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆120Updated 2 months ago
- Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆201Updated last week
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆62Updated 7 months ago
- The code and data for the paper JiuZhang3.0☆47Updated last year
- ☆174Updated last month
- ☆38Updated 2 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"☆154Updated 4 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.☆73Updated last month
- ☆136Updated last month
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆61Updated this week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆67Updated 2 months ago