WooooDyy / BAPOLinks

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.
84Updated last month

Alternatives and similar repositories for BAPO

Users that are interested in BAPO are comparing it to the libraries listed below

Sorting: