cma1114 / activation_steeringLinks
An exploration of LLM steering
☆16Updated last year
Alternatives and similar repositories for activation_steering
Users that are interested in activation_steering are comparing it to the libraries listed below
Sorting:
- Function Vectors in Large Language Models (ICLR 2024)☆176Updated 3 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆112Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆153Updated 5 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆50Updated 8 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆123Updated last month
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆39Updated 9 months ago
- ☆51Updated 4 months ago
- ☆89Updated last year
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆61Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆39Updated 6 months ago
- ☆50Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆122Updated 8 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆58Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆62Updated 8 months ago
- ☆103Updated 6 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- ☆187Updated 3 months ago
- A library for efficient patching and automatic circuit discovery.☆74Updated 3 weeks ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆120Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated 11 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆47Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆77Updated 8 months ago
- Steering Llama 2 with Contrastive Activation Addition☆170Updated last year
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆26Updated 10 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆36Updated 6 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆113Updated last year
- Test-time-training on nearest neighbors for large language models☆45Updated last year
- ☆157Updated 8 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆182Updated 6 months ago
- Replicating O1 inference-time scaling laws☆89Updated 8 months ago