adamjermyn / toy_model_interpretability
☆11Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for toy_model_interpretability
- Mechanistic Interpretability for Transformer Models☆49Updated 2 years ago
- Redwood Research's transformer interpretability tools☆12Updated 2 years ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆24Updated 2 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆14Updated 7 months ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆13Updated 7 months ago
- Code for reproducing the results from the paper Avoiding Side Effects in Complex Environments☆12Updated 3 years ago
- ☆9Updated 5 years ago
- ☆12Updated 3 years ago
- A neurosymbolic T5 agent for playing text games, from the EACL 2023 paper "Behavior Cloned Transformers are Neurosymbolic Reasoners"☆19Updated last year
- ☆26Updated last year
- Documentation for dynamic machine learning systems.☆27Updated 2 months ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆23Updated 7 months ago
- Sparse Autoencoder Training Library☆28Updated 3 weeks ago
- A formalisation of Cartesian Frames, a perspective on embedded agency, in the HOL theorem prover.☆19Updated 2 years ago
- Tools for studying developmental interpretability in neural networks.☆77Updated last week
- A Mechanistic Interpretability Analysis of Grokking☆17Updated 2 years ago
- ☆18Updated 10 months ago
- ☆24Updated 2 years ago
- Measuring the situational awareness of language models☆33Updated 9 months ago
- ☆20Updated 3 years ago
- ☆24Updated 7 months ago
- ☆18Updated 2 months ago
- ☆16Updated 11 months ago
- Repository for "Toward Artificial Open-Ended Evolution within Lenia using Quality-Diversity" (ALIFE 2024).☆15Updated 4 months ago
- Interpreting how transformers simulate agents performing RL tasks☆73Updated last year
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆34Updated last year
- we got you bro☆33Updated 3 months ago
- ☆28Updated last year
- A programming language for formal/informal computation.☆41Updated 5 months ago