zhourunlong / Reflect-RLLinks
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
☆13Updated 3 months ago
Alternatives and similar repositories for Reflect-RL
Users that are interested in Reflect-RL are comparing it to the libraries listed below
Sorting:
- ☆33Updated 11 months ago
- Natural Language Reinforcement Learning☆98Updated 2 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆66Updated 6 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆57Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"☆121Updated 6 months ago
- Directional Preference Alignment☆57Updated last year
- Bayes-Adaptive RL for LLM Reasoning☆40Updated 4 months ago
- Dateset Reset Policy Optimization☆31Updated last year
- ☆116Updated 9 months ago
- ☆49Updated 5 months ago
- ☆101Updated last year
- ☆76Updated last month
- ☆29Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆175Updated 4 months ago
- GenRM-CoT: Data release for verification rationales☆66Updated last year
- ☆49Updated 8 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆120Updated 6 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆151Updated 11 months ago
- ☆68Updated 3 weeks ago
- ☆63Updated 7 months ago
- ☆25Updated 7 months ago
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆16Updated 10 months ago
- ☆50Updated 8 months ago
- ☆152Updated 10 months ago
- ☆34Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆107Updated 2 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆78Updated 9 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 6 months ago
- ☆103Updated last year