McGill-NLP / VinePPOLinks
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆178Updated 5 months ago
Alternatives and similar repositories for VinePPO
Users that are interested in VinePPO are comparing it to the libraries listed below
Sorting:
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Updated last year
- ☆104Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆196Updated 6 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆164Updated 7 months ago