openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
โ178Updated 7 months ago
Alternatives and similar repositories for safety-rbr-code-and-data:
Users that are interested in safety-rbr-code-and-data are comparing it to the libraries listed below
- Benchmarking LLMs with Challenging Tasks from Real Usersโ215Updated 3 months ago
- ๐ Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Papโฆโ145Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"โ123Updated last month
- ๐พ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.โ194Updated last week
- โ130Updated 2 months ago
- Reproducible, flexible LLM evaluationsโ162Updated 2 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ154Updated 2 months ago
- โ95Updated 7 months ago
- Self-Alignment with Principle-Following Reward Modelsโ154Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]โ130Updated 5 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ173Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionโ115Updated 5 months ago
- โ149Updated 2 weeks ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]โ125Updated 2 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied witโฆโ114Updated 7 months ago
- RewardBench: the first evaluation tool for reward models.โ508Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.โ95Updated 4 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"โ100Updated 7 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Followingโ119Updated 7 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ173Updated 9 months ago
- โ92Updated last month
- Replicating O1 inference-time scaling lawsโ82Updated 2 months ago
- Reformatted Alignmentโ114Updated 4 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformersโ126Updated this week
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.โ289Updated 6 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correctโ154Updated last month
- A simple unified framework for evaluating LLMsโ197Updated 2 weeks ago