aadityasingh / HARPLinks
☆23Updated 10 months ago
Alternatives and similar repositories for HARP
Users that are interested in HARP are comparing it to the libraries listed below
Sorting:
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- ☆24Updated 8 months ago
- ☆20Updated 4 months ago
- Code and training scripts for FlexOlmo☆118Updated this week
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆58Updated 2 months ago
- Common tools for data processing☆22Updated last week
- ☆51Updated 10 months ago
- ☆107Updated last year
- Efficient Scaling laws and collaborative pretraining.☆19Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆40Updated 2 months ago
- ☆44Updated 5 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆46Updated 8 months ago
- ☆28Updated last month
- Official repo of paper LM2☆46Updated 10 months ago
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 6 months ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆67Updated 8 months ago
- ☆53Updated 2 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆45Updated 4 months ago
- ☆66Updated 6 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆143Updated 3 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆55Updated 2 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆57Updated last year
- ☆70Updated 5 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆43Updated last year
- ☆29Updated 2 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 10 months ago
- SSRL: Self-Search Reinforcement Learning☆158Updated 4 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Updated last year