angie-chen55 / pref-learning-ranking-accLinks
☆13Updated last year
Alternatives and similar repositories for pref-learning-ranking-acc
Users that are interested in pref-learning-ranking-acc are comparing it to the libraries listed below
Sorting:
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆57Updated 2 months ago
- A library for efficient patching and automatic circuit discovery.☆84Updated last week
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆123Updated last year
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Updated 10 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".