XueruiSu / Trust-Region-Preference-ApproximationView on GitHub
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
14Jun 28, 2025Updated 8 months ago

Alternatives and similar repositories for Trust-Region-Preference-Approximation

Users that are interested in Trust-Region-Preference-Approximation are comparing it to the libraries listed below

Sorting:

Are these results useful?