soniajoseph / ViT-Prisma
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
☆214Updated last week
Alternatives and similar repositories for ViT-Prisma:
Users that are interested in ViT-Prisma are comparing it to the libraries listed below
- ☆255Updated last month
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆189Updated 3 months ago
- ☆121Updated last year
- Sparsify transformers with SAEs and transcoders☆494Updated this week
- Sparse Autoencoder for Mechanistic Interpretability☆233Updated 8 months ago
- Using sparse coding to find distributed representations used by neural networks.☆224Updated last year
- ☆211Updated 5 months ago
- Mechanistic Interpretability Visualizations using React☆233Updated 3 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆163Updated this week
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆76Updated 2 weeks ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆522Updated this week
- ☆89Updated last month
- Tools for understanding how transformer predictions are built layer-by-layer☆480Updated 9 months ago
- ☆441Updated 8 months ago
- Erasing concepts from neural representations with provable guarantees☆226Updated last month
- Training Sparse Autoencoders on Language Models☆669Updated this week
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆186Updated 9 months ago
- ☆263Updated last year
- ☆71Updated this week
- ☆61Updated 4 months ago
- This repository collects all relevant resources about interpretability in LLMs☆327Updated 4 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆206Updated last year
- ☆150Updated 2 weeks ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆117Updated 2 years ago
- ☆113Updated 7 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆97Updated last year
- 🧠 Starter templates for doing interpretability research☆67Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆90Updated last month
- ☆507Updated 7 months ago
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆177Updated last year