openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆186Updated 9 months ago
Alternatives and similar repositories for safety-rbr-code-and-data:
Users that are interested in safety-rbr-code-and-data are comparing it to the libraries listed below
- ☆165Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆144Updated 2 weeks ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 6 months ago
- ☆151Updated 4 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆146Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆134Updated 7 months ago
- ☆287Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- ☆97Updated 10 months ago
- Self-Alignment with Principle-Following Reward Models☆161Updated this week
- Reproducible, flexible LLM evaluations☆198Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆188Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆198Updated this week
- ☆72Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆124Updated 10 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆104Updated last year
- ☆194Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆191Updated this week
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆128Updated 2 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆192Updated 2 weeks ago
- ☆170Updated 3 weeks ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆306Updated 9 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 2 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆85Updated last month
- ☆109Updated 3 months ago
- RewardBench: the first evaluation tool for reward models.☆562Updated this week