A curated list of resources for activation engineering
☆128Oct 2, 2025Updated 5 months ago
Alternatives and similar repositories for awesome-activation-engineering
Users that are interested in awesome-activation-engineering are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆39Jul 18, 2025Updated 7 months ago
- [ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"☆15Feb 15, 2025Updated last year
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆17Feb 27, 2025Updated last year
- [ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"☆49Oct 8, 2025Updated 4 months ago
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Implementation for the paper "Dynamic Language Binding in Relational Visual Reasoning" (Le et al., IJCAI 2020)☆13Jul 25, 2024Updated last year
- ☆16Sep 1, 2025Updated 6 months ago
- A resource repository for representation engineering in large language models☆148Nov 14, 2024Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆168Feb 22, 2026Updated last week
- [ICLR 2026] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆46Aug 16, 2025Updated 6 months ago
- [NeurIPS 2024] "Mind the Gap between Prototypes and Images in Cross-domain Finetuning"☆10Nov 15, 2024Updated last year
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- This repo contains papers, books, tutorials and resources on Riemannian optimization.☆56Feb 4, 2026Updated last month
- KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation, NAACL 2024☆16Jul 29, 2024Updated last year
- [arXiv 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"☆14Apr 3, 2025Updated 11 months ago
- [ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"☆17Feb 22, 2025Updated last year
- Code for Paper ACL'25: FiDELIS: Faithful Reasoning of Large Language Model on Knowledge Graph Question Answering☆18May 8, 2025Updated 9 months ago
- Train toy models using multi-token prediction objective☆14May 8, 2024Updated last year
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated 11 months ago
- [ICML 2025] Logits are All We Need to Adapt Closed Models☆21May 2, 2025Updated 10 months ago
- Steering Llama 2 with Contrastive Activation Addition☆212May 23, 2024Updated last year
- ☆15May 1, 2025Updated 10 months ago
- awesome papers in LLM interpretability☆609Aug 20, 2025Updated 6 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated last year
- [ICLR 2025] General-purpose activation steering library☆144Sep 18, 2025Updated 5 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆66Aug 15, 2025Updated 6 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆73Jan 16, 2026Updated last month
- [NeurIPS 2023] Generalized Logit Adjustment☆39Apr 21, 2024Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆294Jan 22, 2026Updated last month
- [NeurIPS 2023] Code release for "Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity"☆19Oct 19, 2023Updated 2 years ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- A curated list of personalized alignment resources (continually updated).☆62Feb 18, 2026Updated 2 weeks ago
- Tools for exploring Transformer neuron behaviour, including input pruning and diversification.☆23Sep 28, 2023Updated 2 years ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆139Feb 21, 2025Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆390Nov 1, 2024Updated last year
- [ICCV 2023] Black Box Few-Shot Adaptation for Vision-Language models☆26May 14, 2024Updated last year
- ☆231Nov 22, 2024Updated last year
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆172Feb 20, 2024Updated 2 years ago
- Code for our EMNLP 2020 paper "Uncertainty-Aware Label Refinement for Sequence Labeling"☆22Oct 4, 2020Updated 5 years ago