DLR-SC / style-vectors-for-steering-llms
Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.
☆26Updated 4 months ago
Alternatives and similar repositories for style-vectors-for-steering-llms:
Users that are interested in style-vectors-for-steering-llms are comparing it to the libraries listed below
- General-purpose activation steering library☆43Updated last month
- Function Vectors in Large Language Models (ICLR 2024)☆138Updated 4 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆87Updated 2 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆84Updated 4 months ago
- Improving Alignment and Robustness with Circuit Breakers☆184Updated 4 months ago
- Steering Llama 2 with Contrastive Activation Addition☆122Updated 8 months ago
- [NAACL'25] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆46Updated 2 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆104Updated 10 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆88Updated last year
- A resource repository for representation engineering in large language models☆101Updated 3 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆125Updated 2 months ago
- ☆190Updated 11 months ago
- ☆76Updated 6 months ago
- Inspecting and Editing Knowledge Representations in Language Models