XueruiSu / Trust-Region-Preference-ApproximationLinks

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
12Updated 2 months ago

Alternatives and similar repositories for Trust-Region-Preference-Approximation

Users that are interested in Trust-Region-Preference-Approximation are comparing it to the libraries listed below

Sorting: