Collection of Reverse Engineering in Large Model
☆36Jan 8, 2025Updated last year
Alternatives and similar repositories for Awesome-Sparse-Autoencoder
Users that are interested in Awesome-Sparse-Autoencoder are comparing it to the libraries listed below
Sorting:
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- ☆89Dec 18, 2025Updated 2 months ago
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- How do transformer LMs encode relations?☆56Feb 24, 2024Updated 2 years ago
- ☆571Jul 19, 2024Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆297Nov 10, 2023Updated 2 years ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆294Jan 22, 2026Updated last month
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆243Feb 23, 2026Updated last week
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆21Dec 26, 2025Updated 2 months ago
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Jun 13, 2024Updated last year
- ☆70Mar 6, 2025Updated 11 months ago
- Highlight errors in a bib file: missing URLs, capitalization protection, etc☆27May 12, 2024Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆73Jan 16, 2026Updated last month
- ☆28Feb 27, 2025Updated last year
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- ☆231Nov 22, 2024Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆231Aug 2, 2024Updated last year
- The Github repo for our survey paper: "Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large…☆87Jan 30, 2026Updated last month
- ☆396Aug 21, 2025Updated 6 months ago
- [NLPCC 2022] Kformer: Knowledge Injection in Transformer Feed-Forward Layers☆38Oct 20, 2022Updated 3 years ago
- ☆15Oct 24, 2023Updated 2 years ago
- Training Sparse Autoencoders on Language Models☆1,219Feb 23, 2026Updated last week
- Code for text generation papers searches on ArXiv, with very manual jekyll site creation.☆39Jan 15, 2026Updated last month
- [ICLR 2026] Official Implementation of ProxyThinker: Test-Time Guidance through Small Visual Reasoners.☆20Sep 24, 2025Updated 5 months ago
- ☆14Aug 29, 2024Updated last year
- Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.☆12Jun 19, 2024Updated last year
- A friendly UI for arXiv hosting papers on fairness and ethics in Machine Learning & Data Science☆12Jul 4, 2019Updated 6 years ago
- 2D platformer game where you play as a non-flying kind of duck.☆10Apr 9, 2018Updated 7 years ago
- A reinforcement learning agent that learns to solve mazes using Group Relative Policy Optimization (GRPO).☆12Feb 9, 2025Updated last year
- Deep Dual Support Vector Data Description for Anomaly Detection on Attributed Networks☆12Oct 4, 2021Updated 4 years ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆51Nov 17, 2024Updated last year
- ☆11Apr 28, 2024Updated last year
- ☆17Nov 7, 2023Updated 2 years ago
- Adversarial learning by utilizing model interpretation☆10Oct 19, 2018Updated 7 years ago
- Notes and code for Programming Massively Parallel Processors☆13Mar 29, 2025Updated 11 months ago
- Python implementation of Gnutella for CS 114 P2P systems☆12Mar 23, 2012Updated 13 years ago
- Reconstruction ICA☆10Aug 25, 2017Updated 8 years ago
- ☆10Oct 17, 2021Updated 4 years ago
- LLM-based character segmentation agent for ComfyUI based on SAM 3 and the SAM 3 Agent notebook☆25Dec 22, 2025Updated 2 months ago