PKU-Alignment / safe-rlhfLinks
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,573Updated last month
Alternatives and similar repositories for safe-rlhf
Users that are interested in safe-rlhf are comparing it to the libraries listed below
Sorting:
- Secrets of RLHF in Large Language Models Part I: PPO☆1,413Updated last year
- ☆916Updated last year
- [NIPS2023] RRHF & Wombat☆808Updated 2 years ago
- ☆552Updated last year
- Code for the paper Fine-Tuning Language Models from Human Preferences☆1,375Updated 2 years ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆941Updated 11 months ago
- Reference implementation for DPO (Direct Preference Optimization)☆2,832Updated last year
- A plug-and-play library for parameter-efficient-tuning (Delta Tuning)☆1,039Updated last year
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,832Updated last year
- Open Academic Research on Improving LLaMA to SOTA LLM☆1,611Updated 2 years ago
- Aligning Large Language Models with Human: A Survey☆742Updated 2 years ago
- A modular RL library to fine-tune language models to human preferences☆2,374Updated last year
- A very simple GRPO implement for reproducing r1-like LLM thinking.☆1,544Updated 2 months ago
- ☆922Updated last year
- Recipes to train reward model for RLHF.