open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆217Updated 3 months ago
Alternatives and similar repositories for tiny-grpo:
Users that are interested in tiny-grpo are comparing it to the libraries listed below
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆619Updated 3 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆103Updated 3 weeks ago
- RLHF implementation details of OAI's 2019 codebase☆186Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆266Updated 11 months ago
- Large Reasoning Models☆804Updated 5 months ago
- ☆671Updated last week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆338Updated this week
- ☆287Updated last month
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆306Updated 9 months ago
- ☆122Updated 10 months ago
- Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning☆347Updated 4 months ago
- TTRL: Test-Time Reinforcement Learning☆407Updated last week
- ☆153Updated last month
- Notes and commented code for RLHF (PPO)☆90Updated last year
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆176Updated 3 weeks ago
- ☆138Updated 5 months ago
- ☆328Updated 3 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆188Updated 2 months ago
- ☆192Updated 2 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆220Updated last month
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆155Updated 5 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆167Updated 3 weeks ago
- AN O1 REPLICATION FOR CODING☆333Updated 4 months ago
- ☆57Updated 9 months ago
- Deepseek R1 zero tiny version own reproduce on two A100s.☆65Updated 3 months ago
- An extension of the nanoGPT repository for training small MOE models.☆138Updated 2 months ago
- ☆526Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated last month
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆75Updated last month
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆253Updated 2 months ago