☆25Dec 20, 2023Updated 2 years ago
Alternatives and similar repositories for sae-exercises-mats
Users that are interested in sae-exercises-mats are comparing it to the libraries listed below
Sorting:
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆22Feb 13, 2026Updated 3 weeks ago
- ☆17Feb 14, 2024Updated 2 years ago
- Sparse Autoencoder Training Library☆55May 1, 2025Updated 10 months ago
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- Benchmarking LLM Inference Speeds☆13Updated this week
- ☆134Oct 28, 2023Updated 2 years ago
- Codes for the paper The emergence of clusters in self-attention dynamics.☆17Dec 18, 2023Updated 2 years ago
- ☆89Dec 18, 2025Updated 2 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆292Jul 20, 2024Updated last year
- ☆27Nov 28, 2024Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- ☆209Oct 14, 2025Updated 4 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆56Dec 9, 2024Updated last year
- The simplest repository for training medium-sized BackpackLM for cs224n☆25Aug 13, 2023Updated 2 years ago
- ☆58Nov 19, 2024Updated last year
- ☆102Aug 8, 2024Updated last year
- Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "feat…☆43Jul 14, 2025Updated 7 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆248Feb 27, 2026Updated last week
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last month
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆171Updated this week
- ☆153Dec 30, 2025Updated 2 months ago
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- Methods for using OpenFace in R☆11Feb 26, 2024Updated 2 years ago
- open source interpretability platform 🧠☆739Feb 28, 2026Updated last week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆836Updated this week
- ☆38Oct 3, 2023Updated 2 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- [EMNLP 2020] Collective HumAn OpinionS on Natural Language Inference Data☆40Apr 7, 2022Updated 3 years ago
- Recycling diverse models☆46Jan 18, 2023Updated 3 years ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.☆199Updated this week
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆44Dec 8, 2024Updated last year
- Sparsify transformers with SAEs and transcoders☆699Updated this week
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- LoFiT: Localized Fine-tuning on LLM Representations☆44Jan 15, 2025Updated last year
- ☆93Jul 5, 2024Updated last year
- Create string diagrams with LaTeX!☆14Jan 3, 2025Updated last year