Collection of Reverse Engineering in Large Model
☆36Jan 8, 2025Updated last year
Alternatives and similar repositories for Awesome-Sparse-Autoencoder
Users that are interested in Awesome-Sparse-Autoencoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Jun 13, 2024Updated last year
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆17Nov 21, 2025Updated 4 months ago
- [NLPCC 2022] Kformer: Knowledge Injection in Transformer Feed-Forward Layers☆38Oct 20, 2022Updated 3 years ago
- Materials for the paper https://arxiv.org/pdf/2007.15036.pdf☆14Aug 3, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- How do transformer LMs encode relations?☆57Feb 24, 2024Updated 2 years ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆300Jan 22, 2026Updated 2 months ago
- The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…☆110Mar 28, 2026Updated 2 weeks ago
- Using sparse coding to find distributed representations used by neural networks.☆298Nov 10, 2023Updated 2 years ago
- ☆92Dec 18, 2025Updated 3 months ago
- ☆28Feb 27, 2025Updated last year
- ☆18Dec 12, 2025Updated 4 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆236Aug 2, 2024Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆35Mar 8, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆248Updated this week
- Reproduction Code for Paper "Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models"☆14Jun 1, 2024Updated last year
- ☆76Mar 6, 2025Updated last year
- ☆238Nov 22, 2024Updated last year
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆52Nov 17, 2024Updated last year
- [EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners☆19Nov 17, 2025Updated 4 months ago
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- Stable Prediction with Model Misspecification and Agnostic Distribution Shift☆26Apr 26, 2020Updated 5 years ago
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆21Dec 26, 2025Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals☆12May 24, 2024Updated last year
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆20Jun 12, 2025Updated 10 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆79Jan 16, 2026Updated 2 months ago
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Jun 29, 2023Updated 2 years ago
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,488Feb 24, 2026Updated last month
- Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library☆51Aug 20, 2025Updated 7 months ago
- Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.☆12Jun 19, 2024Updated last year
- ☆12Apr 19, 2022Updated 3 years ago
- Code for "Automatic Circuit Finding and Faithfulness"☆17Jul 11, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Sparsify transformers with SAEs and transcoders☆704Updated this week
- [ICLR 2026] Official Implementation of ProxyThinker: Test-Time Guidance through Small Visual Reasoners.☆21Sep 24, 2025Updated 6 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- Exploring Model Kinship for Merging Large Language Models☆28Apr 16, 2025Updated 11 months ago
- A repository for the EMNLP 2021 paper "Is Information Density Uniform in Task-Oriented Dialogues?" and for the CoNLL 2021 paper "Analysin…☆10Jun 17, 2024Updated last year
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 10 months ago
- ☆15Oct 21, 2023Updated 2 years ago