callummcdougall / sae-exercises-matsView external linksLinks
☆25Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for sae-exercises-mats
Users that are interested in sae-exercises-mats are comparing it to the libraries listed below
Sorting:
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆22Updated this week
- Sparse Autoencoder Training Library☆56May 1, 2025Updated 9 months ago
- Applying SAEs for fine-grained control☆25Dec 15, 2024Updated last year
- Algebraic value editing in pretrained language models☆68Nov 1, 2023Updated 2 years ago
- Ἀνατομή is a PyTorch library to analyze representation of neural networks☆13Jan 31, 2024Updated 2 years ago
- Benchmarking LLM Inference Speeds☆13Feb 4, 2026Updated last week
- ☆132Oct 28, 2023Updated 2 years ago
- ☆33Jul 9, 2025Updated 7 months ago
- Using sparse coding to find distributed representations used by neural networks.☆296Nov 10, 2023Updated 2 years ago
- ☆88Dec 18, 2025Updated last month
- Sparse Autoencoder for Mechanistic Interpretability☆291Jul 20, 2024Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆83Nov 27, 2024Updated last year
- u-MPS implementation and experimentation code used in the paper Tensor Networks for Probabilistic Sequence Modeling (https://arxiv.org/ab…☆19Jul 2, 2020Updated 5 years ago
- ☆207Oct 14, 2025Updated 4 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆55Dec 9, 2024Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Mar 7, 2025Updated 11 months ago
- The simplest repository for training medium-sized BackpackLM for cs224n☆25Aug 13, 2023Updated 2 years ago
- Mechanistic Interpretability Visualizations using React☆320Dec 18, 2024Updated last year
- ☆58Nov 19, 2024Updated last year
- ☆99Aug 8, 2024Updated last year
- Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "feat…☆42Jul 14, 2025Updated 7 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆240Dec 16, 2024Updated last year
- Training Sparse Autoencoders on Language Models☆1,201Updated this week
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- ☆198Nov 17, 2024Updated last year
- Universal Neurons in GPT2 Language Models☆30May 28, 2024Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆165Jun 25, 2025Updated 7 months ago
- ☆146Dec 30, 2025Updated last month
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- open source interpretability platform 🧠☆704Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆811Updated this week
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- ☆38Oct 3, 2023Updated 2 years ago
- Interpretating the latent space representations of attention head outputs for LLMs☆36Aug 13, 2024Updated last year
- Recycling diverse models☆46Jan 18, 2023Updated 3 years ago
- ☆394Aug 21, 2025Updated 5 months ago
- Sparsify transformers with SAEs and transcoders☆692Updated this week
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆44Dec 8, 2024Updated last year