A curated list of resources for activation engineering
☆134Oct 2, 2025Updated 6 months ago
Alternatives and similar repositories for awesome-activation-engineering
Users that are interested in awesome-activation-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆40Jul 18, 2025Updated 8 months ago
- [ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"☆15Feb 15, 2025Updated last year
- [ACL 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"☆17Apr 3, 2025Updated last year
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆17Feb 27, 2025Updated last year
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"☆49Oct 8, 2025Updated 6 months ago
- A resource repository for representation engineering in large language models☆150Nov 14, 2024Updated last year
- [NeurIPS 2023] "Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation"☆11Oct 6, 2023Updated 2 years ago
- ☆18Sep 1, 2025Updated 7 months ago
- [NeurIPS 2024] "Mind the Gap between Prototypes and Images in Cross-domain Finetuning"☆11Nov 15, 2024Updated last year
- [ICML 2024] "Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection"☆13Feb 15, 2025Updated last year
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 6 months ago
- An exploration of LLM steering☆26Jun 15, 2024Updated last year
- [ICLR 2026] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆49Aug 16, 2025Updated 7 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Steering Llama 2 with Contrastive Activation Addition☆222May 23, 2024Updated last year
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- [ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆30Feb 4, 2026Updated 2 months ago
- KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation, NAACL 2024☆16Jul 29, 2024Updated last year
- This repo contains papers, books, tutorials and resources on Riemannian optimization.☆57Mar 18, 2026Updated 3 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆62Mar 30, 2024Updated 2 years ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- [NeurIPS 2023] Generalized Logit Adjustment☆40Apr 21, 2024Updated last year
- Welcome to the 'In Context Learning Theory' Reading Group☆30Nov 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICML 2025] Logits are All We Need to Adapt Closed Models☆22May 2, 2025Updated 11 months ago
- [ICLR 2025 Spotlight] Code release for "Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training"☆18Feb 20, 2025Updated last year
- [ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"☆16Feb 22, 2025Updated last year
- awesome papers in LLM interpretability☆613Aug 20, 2025Updated 7 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆79Jan 16, 2026Updated 2 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆40Aug 11, 2025Updated 8 months ago
- [ICML 2024] Code release for "On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm"☆11Feb 20, 2025Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Nov 1, 2024Updated last year
- Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.☆36Sep 26, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆392Nov 1, 2024Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆300Jan 22, 2026Updated 2 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆104Sep 21, 2023Updated 2 years ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆66Aug 15, 2025Updated 7 months ago
- ☆21Mar 17, 2025Updated last year
- A curated list of personalized alignment resources (continually updated).☆67Updated this week