sunblaze-ucb / rl-grok-recipeLinks
Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""
☆28Updated 3 months ago
Alternatives and similar repositories for rl-grok-recipe
Users that are interested in rl-grok-recipe are comparing it to the libraries listed below
Sorting:
- Code for "Reasoning to Learn from Latent Thoughts"☆124Updated 10 months ago
- Reinforcing General Reasoning without Verifiers☆93Updated 7 months ago
- A Sober Look at Language Model Reasoning☆92Updated 2 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆56Updated 3 weeks ago
- Test-time-training on nearest neighbors for large language models☆49Updated last year
- ☆50Updated 11 months ago
- ☆19Updated 5 months ago
- ☆51Updated 2 years ago
- ☆17Updated 6 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆84Updated last year
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆17Updated last year
- ☆36Updated 7 months ago
- ☆51Updated 2 years ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆116Updated this week
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆63Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆85Updated 8 months ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆51Updated 8 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Updated last year
- Code for "Variational Reasoning for Language Models"☆55Updated 4 months ago
- ☆33Updated last year
- ☆130Updated this week
- ☆18Updated last year
- Long Context Extension and Generalization in LLMs☆62Updated last year
- ☆72Updated 7 months ago
- ☆53Updated 9 months ago
- ☆22Updated 4 months ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Updated 3 months ago
- ☆37Updated 2 years ago