PRIME-RL / TTRL
TTRL: Test-Time Reinforcement Learning
☆488Updated 2 weeks ago
Alternatives and similar repositories for TTRL
Users that are interested in TTRL are comparing it to the libraries listed below
Sorting:
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆256Updated 2 months ago
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆384Updated this week
- ☆291Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆925Updated last month
- A series of technical report on Slow Thinking with LLM☆667Updated last month
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,234Updated this week
- ☆181Updated last month
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆224Updated this week
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆383Updated last week
- Large Reasoning Models☆805Updated 5 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆586Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆193Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆202Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆621Updated 3 months ago
- ☆527Updated last month
- ☆691Updated 2 weeks ago
- Awesome RL Reasoning Recipes ("Triple R")☆530Updated last week
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆475Updated this week
- Paper list for Efficient Reasoning.☆432Updated this week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆363Updated last month
- Minimal hackable GRPO implementation☆225Updated 3 months ago
- Awesome RL-based LLM Reasoning☆489Updated last week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆352Updated last week
- ☆131Updated this week
- Dream 7B, a large diffusion language model☆630Updated 2 weeks ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆521Updated 3 weeks ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆131Updated last week
- AN O1 REPLICATION FOR CODING☆334Updated 5 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆176Updated last month
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning☆261Updated this week