PRIME-RL / PRIME
Scalable RL solution for advanced reasoning of language models
☆779Updated this week
Alternatives and similar repositories for PRIME:
Users that are interested in PRIME are comparing it to the libraries listed below
- Large Reasoning Models☆770Updated last month
- ☆687Updated this week
- ☆1,121Updated last month
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆277Updated 2 weeks ago
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,415Updated 2 weeks ago
- OLMoE: Open Mixture-of-Experts Language Models☆522Updated 3 weeks ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆792Updated 2 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆490Updated last week
- O1 Replication Journey: A Strategic Progress Report – Part I☆1,813Updated last month
- An Open Large Reasoning Model for Real-World Solutions☆1,360Updated last month
- ☆990Updated 3 weeks ago
- AN O1 REPLICATION FOR CODING☆296Updated last month
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality s…☆558Updated this week
- ☆297Updated 3 months ago
- ☆423Updated last week
- Recipes to scale inference-time compute of open models☆899Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆682Updated 3 months ago
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi e…☆377Updated 3 weeks ago
- Code for Quiet-STaR☆690Updated 4 months ago
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,076Updated 11 months ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆317Updated 8 months ago
- ☆286Updated 3 weeks ago
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆970Updated 3 months ago
- RewardBench: the first evaluation tool for reward models.☆481Updated this week
- Scalable toolkit for efficient model alignment☆665Updated this week
- veRL: Volcano Engine Reinforcement Learning for LLM☆617Updated this week
- ☆178Updated last month
- A reading list on LLM based Synthetic Data Generation 🔥☆940Updated 2 months ago
- [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models☆573Updated last week
- ☆900Updated 6 months ago