zepingyu0512 / awesome-SAEView external linksLinks
awesome SAE papers
☆72May 24, 2025Updated 8 months ago
Alternatives and similar repositories for awesome-SAE
Users that are interested in awesome-SAE are comparing it to the libraries listed below
Sorting:
- ☆36Jun 13, 2025Updated 8 months ago
- awesome papers in LLM interpretability☆609Aug 20, 2025Updated 5 months ago
- A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…☆183Oct 20, 2025Updated 3 months ago
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis☆12Nov 17, 2024Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆292Jan 22, 2026Updated 3 weeks ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated 11 months ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.☆177Feb 9, 2026Updated last week
- Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning☆35Oct 26, 2025Updated 3 months ago
- ☆17Nov 7, 2023Updated 2 years ago
- ☆25Apr 18, 2025Updated 9 months ago
- ☆25Jun 29, 2025Updated 7 months ago
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- 数据预处理——插值法填补缺失值,并且标记填充位置☆10Apr 19, 2019Updated 6 years ago
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆32Mar 2, 2025Updated 11 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- This repository collects all relevant resources about interpretability in LLMs☆387Nov 1, 2024Updated last year
- ☆88Dec 18, 2025Updated last month
- Official implementation for "Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning"☆12Jun 20, 2025Updated 7 months ago
- Welcome to the 'In Context Learning Theory' Reading Group☆30Nov 8, 2024Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆34Mar 8, 2025Updated 11 months ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- A simple adaboost code using decision stumps as weak classifiers☆11Nov 1, 2012Updated 13 years ago
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 5 months ago
- ☆16May 17, 2021Updated 4 years ago
- 👋 Overcomplete is a Vision-based SAE Toolbox☆118Dec 4, 2025Updated 2 months ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- The implement of paper:"ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability"☆60Jun 3, 2025Updated 8 months ago
- VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues☆44May 20, 2025Updated 8 months ago
- Must-read Papers on Knowledge Editing for Large Language Models.☆1,212Jul 12, 2025Updated 7 months ago
- ☆11Oct 25, 2024Updated last year
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated 11 months ago
- ☆10Jul 6, 2023Updated 2 years ago
- The official code repository for the paper "CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments…☆27Dec 10, 2025Updated 2 months ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆20Feb 9, 2026Updated last week
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated 11 months ago
- ☆14Mar 7, 2025Updated 11 months ago
- Python package to process videos as in Hu and Ma (2024)☆18Sep 29, 2024Updated last year
- ☆12Jul 8, 2024Updated last year