aadityasingh / HARPLinks
☆21Updated 7 months ago
Alternatives and similar repositories for HARP
Users that are interested in HARP are comparing it to the libraries listed below
Sorting:
- Reinforcing General Reasoning without Verifiers☆80Updated 2 months ago
- ☆47Updated 6 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆53Updated 9 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆33Updated last week
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated 2 weeks ago
- Bayes-Adaptive RL for LLM Reasoning☆37Updated 3 months ago
- Natural Language Reinforcement Learning☆95Updated last month
- ☆85Updated last year
- Official repo of paper LM2☆42Updated 6 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆40Updated 3 weeks ago
- Exploration of automated dataset selection approaches at large scales.☆47Updated 6 months ago
- ☆20Updated last month
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆66Updated 5 months ago
- ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆111Updated last week
- ☆39Updated 2 months ago
- ☆93Updated 3 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆116Updated 5 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆61Updated 6 months ago
- Efficient Scaling laws and collaborative pretraining.☆17Updated 7 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆30Updated last year
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆171Updated last month
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- A repo for open research on building large reasoning models☆94Updated this week
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆58Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 3 months ago
- ☆51Updated 2 months ago
- ☆34Updated 7 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆38Updated 2 weeks ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆59Updated last year