ChenmienTan / RL2Links
☆847Updated this week
Alternatives and similar repositories for RL2
Users that are interested in RL2 are comparing it to the libraries listed below
Sorting:
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,229Updated 7 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆257Updated 4 months ago
- adds Sequence Parallelism into LLaMA-Factory☆551Updated last week
- The official implementation of Self-Play Preference Optimization (SPPO)☆578Updated 7 months ago
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS