jkminder / dictionary_learningLinks
Modified to support crosscoder training.
☆17Updated 2 weeks ago
Alternatives and similar repositories for dictionary_learning
Users that are interested in dictionary_learning are comparing it to the libraries listed below
Sorting:
- ☆43Updated 6 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆55Updated 7 months ago
- ☆97Updated last month
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆181Updated this week
- ☆93Updated 3 months ago
- ☆121Updated last year
- A library for efficient patching and automatic circuit discovery.☆65Updated last month
- ☆124Updated 6 months ago
- ☆171Updated last month
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆200Updated 5 months ago
- ☆12Updated last month
- Applying SAEs for fine-grained control☆18Updated 5 months ago
- Sparse Autoencoder Training Library☆52Updated last month
- ☆223Updated 8 months ago
- Engine for collecting, uploading, and downloading model activations☆18Updated 2 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆75Updated 6 months ago
- Attribution-based Parameter Decomposition☆23Updated this week
- ☆50Updated last month
- ☆28Updated last year
- ☆75Updated 3 months ago
- A small package implementing some useful wrapping around nnsight☆13Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆247Updated last year
- Mechanistic Interpretability Visualizations using React☆253Updated 5 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆103Updated 3 months ago
- ☆302Updated 2 weeks ago
- Sparse Autoencoder for Mechanistic Interpretability☆248Updated 10 months ago
- ☆31Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆128Updated 3 weeks ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆34Updated last year
- ☆83Updated 9 months ago