socialfoundations / tttlm
Test-time-training on nearest neighbors for large language models
☆22Updated 5 months ago
Related projects: ⓘ
- ☆47Updated last year
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆12Updated 4 months ago
- ☆61Updated 2 years ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆46Updated last month
- ☆38Updated 8 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆48Updated 5 months ago
- `dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.☆27Updated this week
- ☆30Updated 7 months ago
- ☆23Updated 4 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆22Updated 2 months ago
- AI Logging for Interpretability and Explainability🔬☆74Updated 3 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆55Updated 8 months ago
- Official Repository for Dataset Inference for LLMs☆21Updated last month
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆16Updated 6 months ago
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆16Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆79Updated last year
- ☆43Updated 7 months ago
- ☆22Updated 2 months ago
- Official Repository for ICML 2023 paper "Can Neural Network Memorization Be Localized?"☆16Updated 10 months ago
- Influence Analysis and Estimation - Survey, Papers, and Taxonomy☆58Updated 6 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆27Updated 3 months ago
- ☆12Updated 3 months ago
- Code for the paper "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression"☆17Updated last year
- Landing Page for TOFU☆79Updated 3 months ago
- ☆37Updated 10 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆64Updated 6 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆93Updated last month
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models"☆12Updated last week
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NextGenAISafety @ ICML 2024)☆37Updated last month
- ☆69Updated 10 months ago