ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
β119Updated last month
Related projects β
Alternatives and complementary repositories for function_vectors
- β81Updated last year
- AI Logging for Interpretability and Explainabilityπ¬β89Updated 5 months ago
- β170Updated 8 months ago
- β71Updated 3 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β54Updated 2 weeks ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformersβ75Updated last month
- β72Updated 4 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)β97Updated 7 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steeringβ144Updated last month
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptorsβ69Updated 8 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.β75Updated this week
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)β45Updated 7 months ago
- A resource repository for representation engineering in large language modelsβ54Updated this week
- Algebraic value editing in pretrained language modelsβ57Updated last year
- Inspecting and Editing Knowledge Representations in Language Modelsβ108Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"β91Updated 4 months ago
- PASTA: Post-hoc Attention Steering for LLMsβ108Updated 2 months ago
- β105Updated last month
- Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineeringβ26Updated 3 weeks ago
- β76Updated 9 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".β58Updated 8 months ago
- Steering Llama 2 with Contrastive Activation Additionβ97Updated 5 months ago
- β39Updated last year
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)β62Updated last month
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β62Updated last year
- β151Updated 9 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don'tβ¦β83Updated 4 months ago
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"β35Updated 5 months ago
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.β48Updated this week
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Leβ¦β85Updated 3 years ago