logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆74Updated 3 months ago
Related projects: ⓘ
- ☆69Updated 10 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆107Updated last month
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆48Updated 5 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆55Updated 8 months ago
- ☆61Updated 2 years ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆46Updated last month
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆64Updated 6 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆79Updated last year
- ☆43Updated 7 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- ☆24Updated 4 months ago
- ☆159Updated 6 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆84Updated 5 months ago
- ☆47Updated last year
- ☆136Updated 7 months ago
- A resource repository for representation engineering in large language models☆36Updated last week
- ☆44Updated 2 weeks ago
- Improving Alignment and Robustness with Circuit Breakers☆124Updated 2 months ago
- A Survey on Data Selection for Language Models☆148Updated 3 months ago
- ☆56Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆46Updated this week
- Landing Page for TOFU☆79Updated 3 months ago
- ☆32Updated 10 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆130Updated 2 months ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆38Updated 10 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆76Updated 3 weeks ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆80Updated last year
- ☆74Updated this week
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆72Updated 4 months ago