aadityasingh / HARPLinks
☆22Updated 8 months ago
Alternatives and similar repositories for HARP
Users that are interested in HARP are comparing it to the libraries listed below
Sorting:
- Reinforcing General Reasoning without Verifiers☆90Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆35Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- ☆85Updated last year
- ☆50Updated 8 months ago
- ☆106Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆55Updated 10 months ago
- Long Context Extension and Generalization in LLMs☆61Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆46Updated 2 weeks ago
- Exploration of automated dataset selection approaches at large scales.☆47Updated 7 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated last year
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆51Updated last week
- ☆62Updated 4 months ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆101Updated last month
- SSRL: Self-Search Reinforcement Learning☆147Updated last month
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- ☆55Updated 4 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆26Updated last week
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆87Updated last year
- ☆33Updated 9 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆65Updated 7 months ago
- Efficient Scaling laws and collaborative pretraining.☆18Updated last month
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆114Updated 5 months ago
- ☆74Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆85Updated 4 months ago
- ☆122Updated 7 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 6 months ago
- ☆85Updated 9 months ago
- ☆28Updated last year
- ☆80Updated 3 weeks ago