hkgc-1 / GHPOLinks
☆25Updated 3 weeks ago
Alternatives and similar repositories for GHPO
Users that are interested in GHPO are comparing it to the libraries listed below
Sorting:
- ☆28Updated 3 weeks ago
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆59Updated this week
- ☆48Updated 3 months ago
- ☆91Updated this week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 4 months ago
- ☆46Updated 2 months ago
- ☆78Updated 4 months ago
- ☆96Updated this week
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆99Updated 2 months ago
- Bayes-Adaptive RL for LLM Reasoning☆36Updated 2 months ago
- ☆83Updated 2 weeks ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆80Updated 2 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆36Updated last month
- PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing☆19Updated 4 months ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated last week
- Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆103Updated 3 weeks ago
- ☆67Updated 2 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 5 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆28Updated 7 months ago
- Resa: Transparent Reasoning Models via SAEs☆41Updated 2 months ago
- Natural Language Reinforcement Learning☆92Updated last week
- ☆51Updated 2 months ago
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking☆38Updated 6 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆113Updated 2 months ago
- Esoteric Language Models☆91Updated 2 weeks ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆32Updated this week
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆170Updated last week
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆38Updated 4 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆32Updated 3 months ago