Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆134Feb 15, 2026Updated 2 weeks ago
Alternatives and similar repositories for vivaria
Users that are interested in vivaria are comparing it to the libraries listed below
Sorting:
- ☆33Jun 4, 2025Updated 9 months ago
- ☆120Jan 19, 2026Updated last month
- METR Task Standard☆177Feb 3, 2025Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆224Feb 13, 2026Updated 3 weeks ago
- Inspect: A framework for large language model evaluations☆1,800Updated this week
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆158Feb 27, 2026Updated last week
- Inference API for many LLMs and other useful tools for empirical research☆107Feb 27, 2026Updated last week
- ☆21Jun 22, 2025Updated 8 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- Collection of evals for Inspect AI☆393Updated this week
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- Tools for studying developmental interpretability in neural networks.☆127Dec 30, 2025Updated 2 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- ☆45Feb 13, 2026Updated 3 weeks ago
- ☆65Feb 20, 2026Updated 2 weeks ago
- ☆20Feb 17, 2023Updated 3 years ago
- ☆22Sep 9, 2021Updated 4 years ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- ☆25Sep 3, 2025Updated 6 months ago
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- ☆13May 7, 2023Updated 2 years ago
- Mechanistic Interpretability Visualizations using React☆328Dec 18, 2024Updated last year
- ☆27Oct 6, 2024Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- A text-based game where language models learn to lie and to detect lies.☆12Oct 4, 2023Updated 2 years ago
- Make inso available in your GitHub Actions workflows☆11Jul 16, 2025Updated 7 months ago
- An Inspect extension for agentic cyber evaluations☆22Feb 24, 2026Updated last week
- Experiments in applying interpretability techniques to learned reward functions.☆10Dec 11, 2020Updated 5 years ago
- ☆12Jul 12, 2024Updated last year
- A toolkit for describing model features and intervening on those features to steer behavior.☆230Dec 12, 2025Updated 2 months ago
- A dataset of alignment research and code to reproduce it☆78Jun 22, 2023Updated 2 years ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Feb 27, 2026Updated last week
- ☆960Updated this week
- ☆28May 4, 2023Updated 2 years ago
- ☆23Jan 27, 2026Updated last month
- Unofficial Experiments with AlgebraNets☆17Jun 17, 2020Updated 5 years ago
- Stampy's copy of Alignment Research Dataset scraper☆13Dec 26, 2025Updated 2 months ago
- This is a repository for code, data, and models associated with the paper LLM-RUBRIC: A Multidimensional, Calibrated Approach to Automate…☆25Feb 18, 2025Updated last year
- Machine Learning for Alignment Bootcamp☆82Apr 27, 2022Updated 3 years ago