openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆183Updated 8 months ago
Alternatives and similar repositories for safety-rbr-code-and-data:
Users that are interested in safety-rbr-code-and-data are comparing it to the libraries listed below
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆96Updated 8 months ago
- Self-Alignment with Principle-Following Reward Models☆156Updated last year
- ☆166Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago