Jiahao004 / DeepTheoremLinks
☆24Updated 6 months ago
Alternatives and similar repositories for DeepTheorem
Users that are interested in DeepTheorem are comparing it to the libraries listed below
Sorting:
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- ☆70Updated last year
- ☆51Updated 10 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆118Updated 7 months ago
- ☆26Updated last year
- Exploration of automated dataset selection approaches at large scales.☆50Updated 9 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆61Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 6 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆68Updated 9 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆51Updated 6 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 10 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆45Updated 8 months ago
- Replicating O1 inference-time scaling laws☆90Updated last year
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆48Updated 5 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- ☆30Updated 11 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 8 months ago
- The code and data for the paper JiuZhang3.0☆49Updated last year
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆139Updated 2 months ago
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆104Updated 2 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆29Updated 2 years ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆67Updated 8 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆31Updated 4 months ago
- ☆18Updated last year
- ☆28Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated last year
- A comprehensive benchmark for evaluating deep research agents on academic survey tasks☆38Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- Evaluate the Quality of Critique☆36Updated last year