jinpz / q_sharpLinks
The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training
☆16Updated 7 months ago
Alternatives and similar repositories for q_sharp
Users that are interested in q_sharp are comparing it to the libraries listed below
Sorting:
- ☆104Updated last year
- Rewarded soups official implementation☆60Updated 2 years ago
- Learn online intrinsic rewards from LLM feedback☆44Updated 10 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆178Updated 5 months ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)