liziniu / ReMaxLinks
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆199Updated 2 years ago
Alternatives and similar repositories for ReMax
Users that are interested in ReMax are comparing it to the libraries listed below
Sorting:
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆151Updated 11 months ago
- ☆213Updated 10 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…