EleutherAI / sae
Sparse autoencoders
☆333Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for sae
- Training Sparse Autoencoders on Language Models☆449Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆157Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆181Updated 11 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆187Updated 3 months ago
- ☆108Updated last year
- Mechanistic Interpretability Visualizations using React☆195Updated 3 months ago
- ☆320Updated 3 months ago
- ☆99Updated this week
- ☆141Updated 2 weeks ago
- ☆102Updated last month
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆399Updated this week
- ☆186Updated last month
- Steering vectors for transformer language models in Pytorch / Huggingface☆64Updated last month
- This repository collects all relevant resources about interpretability in LLMs☆282Updated last week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆117Updated last month
- Extract full next-token probabilities via language model APIs☆228Updated 8 months ago
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).☆173Updated this week
- A toolkit for describing model features and intervening on those features to steer behavior.☆69Updated this week
- ☆43Updated 4 months ago
- Steering Llama 2 with Contrastive Activation Addition☆94Updated 5 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆429Updated 5 months ago
- Erasing concepts from neural representations with provable guarantees☆208Updated 3 weeks ago
- ☆252Updated 8 months ago
- Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions☆633Updated this week
- ☆96Updated 3 months ago
- ☆99Updated 3 months ago
- ☆24Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆176Updated 5 months ago
- ☆75Updated 9 months ago