hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆83Updated 3 months ago
Alternatives and similar repositories for o1_inference_scaling_laws:
Users that are interested in o1_inference_scaling_laws are comparing it to the libraries listed below
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- ☆60Updated 10 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆76Updated 2 weeks ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 11 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆52Updated 11 months ago
- ☆156Updated 2 weeks ago
- ☆95Updated 8 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- ☆87Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆142Updated last month
- ☆64Updated 4 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆70Updated 8 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- ☆36Updated 4 months ago
- ☆111Updated last month
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆142Updated 4 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆34Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆119Updated 6 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- ☆73Updated 7 months ago
- ☆83Updated last month
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆53Updated 5 months ago
- Can Language Models Solve Olympiad Programming?☆111Updated 2 months ago