bradhilton / o1-chain-of-thought
o1 Chain of Thought Examples
☆33Updated 5 months ago
Alternatives and similar repositories for o1-chain-of-thought:
Users that are interested in o1-chain-of-thought are comparing it to the libraries listed below
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- Exploration of automated dataset selection approaches at large scales.☆34Updated 3 weeks ago
- Toy implementation of Strawberry☆31Updated 6 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- ☆52Updated 2 weeks ago
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆52Updated last year
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆44Updated last month
- CodeUltraFeedback: aligning large language models to coding preferences☆71Updated 9 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆102Updated 3 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆12Updated 4 months ago
- Critique-out-Loud Reward Models☆56Updated 5 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- ☆96Updated 9 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆80Updated this week
- ☆39Updated this week
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- Reformatted Alignment☆115Updated 6 months ago
- ☆24Updated 6 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆53Updated 7 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆48Updated last month
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 3 months ago
- ☆103Updated 2 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆54Updated 5 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 9 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year