ZJU-REAL / HBPOLinks

☆31

Alternatives and similar repositories for HBPO

Users that are interested in HBPO are comparing it to the libraries listed below

Sorting:

ZJU-REAL / LAPO
☆36Updated last month
ZJU-REAL / Mind-the-Gap
[NeurIPS 2025] Mind the Gap: Bridging Thought Leap for Improved CoT Tuning https://arxiv.org/abs/2505.14684
☆45Updated last month
ZJU-REAL / EasySteer
A Unified Framework for High-Performance and Extensible LLM Steering
☆133Updated this week
LightChen233 / reasoning-boundary
☆69Updated 5 months ago
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆26Updated 3 months ago
TEAM-ARM / arm
[NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model
☆60Updated last month
RyanLiu112 / GenPRM
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆91Updated last month
lichengliu03 / unary-feedback
☆38Updated 3 months ago
StarDewXXX / O1-Pruner
Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
☆98Updated 9 months ago
RUCAIBox / R1-Searcher-plus
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
☆65Updated 6 months ago
multimodal-art-projection / REER_DeepWriter
REverse-Engineered Reasoning for Open-Ended Generation
☆83Updated 2 months ago
ZJU-REAL / Self-Braking-Tuning
[NeurIPS 2025] Let LRMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604
☆51Updated last month
StarDewXXX / AdaR1
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆20Updated 3 weeks ago
KANABOON1 / MemGen
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
☆230Updated last week
RUC-NLPIR / Tool-Star
🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning
☆293Updated last month
zjunlp / LightThinker
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
☆123Updated 7 months ago
MingyuJ666 / Disentangling-Memory-and-Reasoning
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆79Updated last month
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆140Updated 3 weeks ago
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆122Updated 2 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆180Updated 3 months ago
NineAbyss / S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆72Updated 7 months ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆87Updated 9 months ago
AIFrameResearch / SPO
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
☆42Updated 2 months ago
test-time-interaction / TTI
☆65Updated 5 months ago
THU-KEG / AdaptThink
☆169Updated last month
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆332Updated 2 months ago
bigai-nlco / LatentSeek
Official Repository of LatentSeek
☆69Updated 6 months ago
THUDM / TreeRL
TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25
☆82Updated 5 months ago
zhangxy-2019 / critique-GRPO
☆48Updated 2 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated last year