jkminder / dictionary_learning
☆13Updated this week
Alternatives and similar repositories for dictionary_learning:
Users that are interested in dictionary_learning are comparing it to the libraries listed below
- ☆40Updated 5 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆55Updated 6 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆172Updated this week
- ☆121Updated last year
- Sparse Autoencoder Training Library☆49Updated last week
- ☆93Updated 3 weeks ago
- ☆92Updated 2 months ago
- Applying SAEs for fine-grained control☆17Updated 4 months ago
- A library for efficient patching and automatic circuit discovery.☆64Updated 2 weeks ago
- ☆165Updated last month
- Engine for collecting, uploading, and downloading model activations☆15Updated last month
- ☆111Updated 5 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆99Updated 2 months ago
- ☆38Updated this week
- ☆280Updated 2 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆199Updated 4 months ago
- ☆12Updated 3 weeks ago
- ☆27Updated last year
- ☆223Updated 7 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆73Updated 5 months ago
- ☆70Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆114Updated this week
- ☆17Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆31Updated 11 months ago
- ☆114Updated 9 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆75Updated last year
- Mechanistic Interpretability Visualizations using React☆242Updated 4 months ago
- Sparsify transformers with SAEs and transcoders☆524Updated this week
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 3 months ago