Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,818Jun 17, 2025Updated 8 months ago
Alternatives and similar repositories for hh-rlhf
Users that are interested in hh-rlhf are comparing it to the libraries listed below
Sorting:
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,738Jan 8, 2024Updated 2 years ago
- A modular RL library to fine-tune language models to human preferences☆2,380Mar 1, 2024Updated 2 years ago
- ☆251Dec 21, 2022Updated 3 years ago
- Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback☆1,589Nov 24, 2025Updated 3 months ago
- Code for the paper Fine-Tuning Language Models from Human Preferences☆1,378Jul 25, 2023Updated 2 years ago
- Train transformer language models with reinforcement learning.☆17,523Updated this week
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆2,094Jun 1, 2023Updated 2 years ago
- [NIPS2023] RRHF & Wombat