zwhe99 / DeepMath
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
☆180Updated this week
Alternatives and similar repositories for DeepMath:
Users that are interested in DeepMath are comparing it to the libraries listed below
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆186Updated 3 weeks ago
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- ☆287Updated last month
- ☆192Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- ☆151Updated 4 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆220Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- ☆153Updated last month
- ☆92Updated 3 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆118Updated this week
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆244Updated 3 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- ☆149Updated last week
- ☆279Updated 9 months ago
- ☆121Updated this week
- Test-time preferenece optimization.☆114Updated last week
- Repo of paper "Free Process Rewards without Process Labels"☆145Updated last month
- An Open Math Pre-trainng Dataset with 370B Tokens.☆78Updated last month
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆103Updated last month
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆186Updated 9 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆89Updated 2 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- ☆138Updated last week
- ☆199Updated 2 months ago
- ☆115Updated 2 weeks ago
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆105Updated this week
- The official repository of the Omni-MATH benchmark.☆83Updated 4 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆234Updated 3 weeks ago