Jiahao004 / DeepTheoremLinks
☆25Updated 7 months ago
Alternatives and similar repositories for DeepTheorem
Users that are interested in DeepTheorem are comparing it to the libraries listed below
Sorting:
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- ☆25Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆120Updated 9 months ago
- ☆18Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆47Updated 9 months ago
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆49Updated 7 months ago
- ☆24Updated 10 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Updated last year
- ☆108Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆85Updated 8 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆19Updated 7 months ago
- ☆23Updated last year
- Replicating O1 inference-time scaling laws☆92Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Updated 3 months ago
- ☆72Updated 8 months ago
- ☆14Updated 2 years ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆19Updated 11 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆113Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Updated 6 months ago
- ☆42Updated last year
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆108Updated 8 months ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆21Updated 11 months ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- Exploration of automated dataset selection approaches at large scales.☆52Updated 11 months ago
- Directional Preference Alignment☆58Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- ☆52Updated 11 months ago
- ☆30Updated last year
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆180Updated 7 months ago
- ☆14Updated last year