dit7ya / awesome-ai-alignment
A curated list of awesome resources for Artificial Intelligence Alignment research
ā69Updated last year
Alternatives and similar repositories for awesome-ai-alignment:
Users that are interested in awesome-ai-alignment are comparing it to the libraries listed below
- Keeping language models honest by directly eliciting knowledge encoded in their activations.ā197Updated last week
- š§ Starter templates for doing interpretability researchā67Updated last year
- Tools for studying developmental interpretability in neural networks.ā87Updated 2 months ago
- ā53Updated 6 months ago
- we got you broā35Updated 8 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"ā73Updated last year
- Sparse and discrete interpretability tool for neural networksā60Updated last year
- Mechanistic Interpretability Visualizations using Reactā238Updated 3 months ago
- ā26Updated 11 months ago
- Machine Learning for Alignment Bootcamp (MLAB).ā28Updated 3 years ago
- Experiments with representation engineering