ai-safety-graph / AISafetyGraphLinks
AI Safety Graph
☆18Updated 10 months ago
Alternatives and similar repositories for AISafetyGraph
Users that are interested in AISafetyGraph are comparing it to the libraries listed below
Sorting:
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆150Updated last week
- Benchmarks for the Evaluation of LLM Supervision☆32Updated last week
- (Model-written) LLM evals library☆18Updated last year
- Mechanistic Interpretability Visualizations using React☆315Updated last year
- ☆328Updated last year
- Machine Learning for Alignment Bootcamp☆81Updated 3 years ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆791Updated this week
- METR Task Standard☆171Updated 11 months ago
- ☆20Updated 2 years ago
- ☆429Updated this week
- ☆117Updated last week
- Machine Learning for Alignment Bootcamp (MLAB).☆31Updated 4 years ago
- Unified access to Large Language Model modules using NNsight☆81Updated 2 weeks ago
- ☆86Updated last year
- Aligning AI With Shared Human Values (ICLR 2021)☆314Updated 2 years ago
- ☆909Updated last week
- Tools for studying developmental interpretability in neural networks.☆124Updated last month
- Interactive Composition Explorer: a debugger for compositional language model programs☆563Updated 3 weeks ago
- ☆265Updated last year
- open source interpretability platform 🧠☆675Updated this week
- Helper scripts for Pattern Recognition NTUA Course☆11Updated last year
- Interpretability for sequence generation models 🐛 🔍☆454Updated 3 weeks ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆216Updated last week
- Inspect: A framework for large language model evaluations☆1,686Updated last week
- A library for generative social simulation☆1,155Updated last week
- Sparse Autoencoder for Mechanistic Interpretability☆289Updated last year
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆54Updated this week
- ☆39Updated last week
- Fairness toolkit for pytorch, scikit learn and autogluon☆33Updated 2 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆132Updated 3 weeks ago