This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude code out of the box
☆130Feb 8, 2026Updated 3 weeks ago
Alternatives and similar repositories for seer
Users that are interested in seer are comparing it to the libraries listed below
Sorting:
- ☆24Feb 18, 2026Updated 2 weeks ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- ☆17Aug 30, 2025Updated 6 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆305Feb 8, 2026Updated 3 weeks ago
- ☆153Dec 30, 2025Updated 2 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆29Jan 12, 2026Updated last month
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated 11 months ago
- Engine for collecting, uploading, and downloading model activations☆26Apr 2, 2025Updated 11 months ago
- ☆399Aug 21, 2025Updated 6 months ago
- ☆75Feb 18, 2026Updated 2 weeks ago
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated 2 months ago
- ☆18Feb 25, 2026Updated last week
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- Training Sparse Autoencoders on Language Models☆1,233Feb 27, 2026Updated last week
- ☆273Oct 1, 2024Updated last year
- ☆25Nov 11, 2025Updated 3 months ago
- Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"☆27Jun 4, 2024Updated last year
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆24Aug 15, 2025Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Feb 27, 2026Updated last week
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆117Oct 27, 2025Updated 4 months ago
- Efficiently computing & storing token n-grams from large corpora☆27Oct 6, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆192Apr 17, 2025Updated 10 months ago
- A library for mechanistic interpretability of GPT-style language models☆3,133Updated this week
- Inference API for many LLMs and other useful tools for empirical research☆107Feb 27, 2026Updated last week
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- Decoder only transformer, built from scratch with PyTorch☆33Oct 22, 2023Updated 2 years ago
- ☆35Sep 13, 2023Updated 2 years ago
- Situational Awareness Dataset☆46Dec 14, 2024Updated last year
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆11Oct 23, 2023Updated 2 years ago
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- The official starter-kit for NeurIPS 2025 mind games competition☆21Jul 27, 2025Updated 7 months ago
- Scape firmware metadata from 18 vendors and download corresponding firmware images. Save in MySQL database for InfoSec research purposes.☆12Feb 17, 2023Updated 3 years ago
- Interpreting how transformers simulate agents performing RL tasks☆90Oct 23, 2023Updated 2 years ago
- ☆36Apr 30, 2024Updated last year