open-thought / tiny-grpoLinks
Minimal hackable GRPO implementation
☆232Updated 4 months ago
Alternatives and similar repositories for tiny-grpo
Users that are interested in tiny-grpo are comparing it to the libraries listed below
Sorting:
- ☆731Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆367Updated last week
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆272Updated last year
- ☆293Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆105Updated 3 weeks ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆314Updated 9 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆631Updated 4 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- TTRL: Test-Time Reinforcement Learning☆570Updated last week
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning☆375Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆299Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆141Updated 5 months ago
- Large Reasoning Models☆803Updated 6 months ago
- RLHF implementation details of OAI's 2019 codebase☆187Updated last year
- Tina: Tiny Reasoning Models via LoRA☆245Updated last week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆231Updated 3 weeks ago
- ☆141Updated 6 months ago
- ☆201Updated 3 months ago
- Notes and commented code for RLHF (PPO)☆94Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆956Updated last week
- ☆210Updated last week
- ☆330Updated 3 months ago
- ☆151Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- Deepseek R1 zero tiny version own reproduce on two A100s.☆67Updated 4 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆95Updated 3 months ago
- A series of technical report on Slow Thinking with LLM☆685Updated this week
- ☆554Updated last month
- ☆540Updated 5 months ago
- Scalable toolkit for efficient model reinforcement☆385Updated this week