OpenMOSS / BandPOView on GitHub
Official implementation of BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning. BandPO replaces canonical clipping (PPO/GRPO) with dynamic bounds to resolve exploration bottlenecks and prevent entropy collapse.
41Mar 9, 2026Updated last week

Alternatives and similar repositories for BandPO

Users that are interested in BandPO are comparing it to the libraries listed below

Sorting:

Are these results useful?