aadityasingh / HARPLinks
☆22Updated 9 months ago
Alternatives and similar repositories for HARP
Users that are interested in HARP are comparing it to the libraries listed below
Sorting:
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆40Updated last month
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 7 months ago
- ☆20Updated 3 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆44Updated 3 months ago
- ☆103Updated 6 months ago
- ☆27Updated 2 weeks ago
- Exploration of automated dataset selection approaches at large scales.☆50Updated 8 months ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆45Updated 3 months ago
- ☆33Updated 10 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Updated last year
- ☆51Updated 9 months ago
- ☆41Updated 5 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆54Updated last month
- ☆64Updated 5 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 6 months ago
- ☆17Updated 3 months ago
- Code and training scripts for FlexOlmo☆113Updated this week
- Common tools for data processing☆21Updated 3 weeks ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 10 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆52Updated 2 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 4 months ago
- ☆88Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆116Updated 6 months ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆84Updated 3 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆69Updated 7 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆109Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated last year
- Natural Language Reinforcement Learning☆100Updated 3 months ago