kxcloud / gradient-routing
☆9Updated 3 months ago
Alternatives and similar repositories for gradient-routing:
Users that are interested in gradient-routing are comparing it to the libraries listed below
- ☆34Updated 3 weeks ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆31Updated 6 months ago
- A library for efficient patching and automatic circuit discovery.☆59Updated last month
- Sparse Autoencoder Training Library☆43Updated 4 months ago
- A TinyStories LM with SAEs and transcoders☆11Updated 2 months ago
- ☆61Updated 4 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 2 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆46Updated 4 months ago
- Experiments with representation engineering☆11Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆163Updated this week
- ☆26Updated 11 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 10 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆117Updated 2 years ago
- ☆26Updated 7 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆16Updated 4 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆14Updated 5 months ago
- ☆26Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆131Updated 10 months ago
- Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons☆12Updated 2 years ago
- ☆71Updated this week
- ☆211Updated 5 months ago
- ☆90Updated last month
- ☆82Updated 7 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆42Updated 5 months ago
- ☆32Updated 4 months ago
- ☆26Updated 2 months ago
- ☆23Updated 3 weeks ago
- ☆17Updated 11 months ago
- ☆34Updated last year
- Code to enable layer-level steering in LLMs using sparse auto encoders☆13Updated 6 months ago