☆117Feb 11, 2025Updated last year
Alternatives and similar repositories for linear_rep_geometry
Users that are interested in linear_rep_geometry are comparing it to the libraries listed below
Sorting:
- ☆113Feb 11, 2025Updated last year
- ☆33Jul 9, 2025Updated 7 months ago
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- ☆396Aug 21, 2025Updated 6 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆57Oct 30, 2025Updated 4 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆192Apr 17, 2025Updated 10 months ago
- ☆15Mar 13, 2025Updated 11 months ago
- Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)☆37Jan 23, 2024Updated 2 years ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Updated this week
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆105Nov 10, 2023Updated 2 years ago
- ☆46Oct 11, 2023Updated 2 years ago
- Mechanistic Interpretability Visualizations using React☆326Dec 18, 2024Updated last year
- ☆100Aug 8, 2024Updated last year
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated 2 months ago
- ☆21Jun 22, 2025Updated 8 months ago
- Training Sparse Autoencoders on Language Models☆1,219Feb 23, 2026Updated last week
- Steering Llama 2 with Contrastive Activation Addition☆212May 23, 2024Updated last year
- ☆273Oct 1, 2024Updated last year
- ☆209Oct 14, 2025Updated 4 months ago
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆863Jan 29, 2026Updated last month
- ☆284Mar 2, 2024Updated 2 years ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆825Feb 23, 2026Updated last week
- ☆70Mar 6, 2025Updated 11 months ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- ☆15Jul 9, 2025Updated 7 months ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- Code repo for the model organisms and convergent directions of EM papers.☆51Sep 22, 2025Updated 5 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆351Jun 13, 2025Updated 8 months ago
- ☆571Jul 19, 2024Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆243Feb 23, 2026Updated last week
- ☆35May 9, 2025Updated 9 months ago
- Sparse and discrete interpretability tool for neural networks☆64Feb 12, 2024Updated 2 years ago
- TopoLM: brain-like spatio-functional organization in a topographic language model☆27May 23, 2025Updated 9 months ago
- Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.☆24Mar 27, 2025Updated 11 months ago
- A fast, effective data attribution method for neural networks in PyTorch☆232Nov 18, 2024Updated last year
- "How to Trust Your Diffusion Models: A Convex Optimization Approach to Conformal Risk Control"☆18Jan 6, 2026Updated last month
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆953Aug 14, 2024Updated last year