awesome SAE papers
☆74May 24, 2025Updated 9 months ago
Alternatives and similar repositories for awesome-SAE
Users that are interested in awesome-SAE are comparing it to the libraries listed below
Sorting:
- ☆36Jun 13, 2025Updated 8 months ago
- awesome papers in LLM interpretability☆609Aug 20, 2025Updated 6 months ago
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆190Updated this week
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 2 weeks ago
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated last year
- Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning☆36Oct 26, 2025Updated 4 months ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.☆199Updated this week
- ☆17Nov 7, 2023Updated 2 years ago
- ☆25Feb 20, 2026Updated 2 weeks ago
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- ☆25Jun 29, 2025Updated 8 months ago
- ☆41Jul 6, 2025Updated 8 months ago
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- 数据预处理——插值法填补缺失值,并且标记填充位置☆10Apr 19, 2019Updated 6 years ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last month
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆32Mar 2, 2025Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆390Nov 1, 2024Updated last year
- ☆89Dec 18, 2025Updated 2 months ago
- Official implementation for "Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning"☆12Jun 20, 2025Updated 8 months ago
- ☆13Oct 5, 2025Updated 5 months ago
- Welcome to the 'In Context Learning Theory' Reading Group☆30Nov 8, 2024Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆34Mar 8, 2025Updated last year
- 👋 Overcomplete is a Vision-based SAE Toolbox☆126Dec 4, 2025Updated 3 months ago
- Convolutional Neural Network (CNN) was trained on 48x48 pixel grayscale images to predict 5 different emotions from images. Ten different…☆11Sep 21, 2022Updated 3 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues☆44May 20, 2025Updated 9 months ago
- A holistic benchmark for LLM abstention☆73Aug 27, 2025Updated 6 months ago
- Must-read Papers on Knowledge Editing for Large Language Models.☆1,220Jul 12, 2025Updated 7 months ago
- Open source interpretability artefacts for R1.☆172Apr 21, 2025Updated 10 months ago
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated last year
- ☆11Oct 25, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆14Nov 25, 2024Updated last year
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- ☆10Jul 6, 2023Updated 2 years ago
- Redefining Video Management with power of SQL☆11Oct 15, 2023Updated 2 years ago
- ☆14Mar 7, 2025Updated last year