ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆46Updated 2 weeks ago
Alternatives and similar repositories for awesome-activation-engineering:
Users that are interested in awesome-activation-engineering are comparing it to the libraries listed below
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 2 months ago
- awesome SAE papers☆23Updated last month
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆56Updated 11 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆82Updated 8 months ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆17Updated 3 months ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.☆72Updated 5 months ago
- ☆18Updated last month
- ☆45Updated 4 months ago
- ☆50Updated last year
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆60Updated last year
- Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic☆23Updated 2 months ago
- ☆24Updated 2 years ago
- ☆16Updated last week
- Latest Advances on Modality Priors in Multimodal Large Language Models☆10Updated 2 weeks ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆42Updated 5 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆46Updated 6 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆56Updated 5 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆42Updated 5 months ago
- An implementation for MLLM oversensitivity evaluation☆10Updated 4 months ago
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆36Updated 4 months ago
- ☆28Updated 9 months ago
- ☆38Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated 9 months ago
- Accepted LLM Papers in NeurIPS 2024☆34Updated 5 months ago
- ☆52Updated 8 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆52Updated 3 weeks ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆37Updated 3 months ago
- ☆21Updated 2 weeks ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 9 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆73Updated last month