shuyhere / Awesome-Sparse-Autoencoder
Collection of Reverse Engineering in Large Model
☆26Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Awesome-Sparse-Autoencoder
- The Paper List on Data Contamination for Large Language Models Evaluation.☆73Updated this week
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆45Updated 7 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated last week
- ☆68Updated 3 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆118Updated last month
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆12Updated last year
- ☆79Updated last year
- ☆53Updated 2 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆82Updated 8 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆95Updated 2 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆56Updated last month
- AI Logging for Interpretability and Explainability🔬☆87Updated 5 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆68Updated 3 weeks ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆69Updated 8 months ago
- ☆96Updated 3 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆168Updated 3 weeks ago
- A resource repository for representation engineering in large language models☆50Updated 2 months ago
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆45Updated this week
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆96Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆63Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆21Updated 4 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆143Updated 3 weeks ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆57Updated 8 months ago
- ☆70Updated 3 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆45Updated last month
- ☆102Updated last month
- ☆34Updated 3 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆21Updated 4 months ago