StampyAI / alignment-research-dataset
Stampy's copy of Alignment Research Dataset scraper
☆12Updated this week
Alternatives and similar repositories for alignment-research-dataset:
Users that are interested in alignment-research-dataset are comparing it to the libraries listed below
- ControlArena is a suite of realistic settings, mimicking complex deployment environments, for running control evaluations. This is an alp…☆50Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆89Updated last week
- A dataset of alignment research and code to reproduce it☆77Updated last year
- ☆10Updated 9 months ago
- Measuring the situational awareness of language models☆34Updated last year
- METR Task Standard☆146Updated 2 months ago
- Repository with sample code using Apollo's suggested engineering practices☆8Updated 4 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆15Updated 6 months ago
- ☆89Updated last month
- ☆26Updated last year
- ☆21Updated 3 weeks ago
- Machine Learning for Alignment Bootcamp☆72Updated 2 years ago
- ☆72Updated 2 months ago
- we got you bro☆35Updated 8 months ago
- ☆31Updated 11 months ago
- Redwood Research's transformer interpretability tools☆14Updated 3 years ago
- ☆13Updated 2 weeks ago
- Mechanistic Interpretability Visualizations using React☆241Updated 4 months ago
- Inference API for many LLMs and other useful tools for empirical research☆33Updated this week
- (Model-written) LLM evals library☆18Updated 4 months ago
- Mechanistic Interpretability for Transformer Models☆50Updated 2 years ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆28Updated this week
- ☆18Updated 9 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆17Updated 5 months ago
- Tools for studying developmental interpretability in neural networks.☆88Updated 3 months ago
- ☆54Updated 6 months ago
- ☆266Updated 9 months ago
- Open source interpretability artefacts for R1.☆82Updated this week
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆18Updated last year
- Sphynx Hallucination Induction☆53Updated 2 months ago