koo-ec / Awesome-LLM-Explainability
A curated list of explainability-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the explainability implications, challenges, and advancements surrounding these powerful models.
☆9Updated 5 months ago
Related projects: ⓘ
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆82Updated 2 months ago
- ☆94Updated 8 months ago
- Weak-to-Strong Jailbreaking on Large Language Models☆62Updated 6 months ago
- ☆136Updated 7 months ago
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆133Updated 10 months ago
- This repo contains code for paper: "Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach".☆11Updated 3 months ago
- Parsimonious Concept Engineering (PaCE) uses sparse coding on a large-scale concept dictionary to effectively improve the trustworthiness…☆25Updated 3 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆84Updated 5 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆107Updated this week
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆64Updated 6 months ago
- Code for paper: Are Large Language Models Post Hoc Explainers?☆21Updated last month
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- LLM Unlearning☆112Updated 11 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆176Updated 5 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆130Updated 2 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆72Updated 4 months ago
- The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"☆17Updated last month
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆59Updated 6 months ago
- A Survey on Data Selection for Language Models☆148Updated 3 months ago
- ☆37Updated 10 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆82Updated 2 months ago
- Official repository of the MIRAGE benchmark☆82Updated last month
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆96Updated 4 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆109Updated 2 weeks ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆36Updated 2 months ago
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆39Updated last month
- ☆87Updated 2 months ago
- ☆42Updated 5 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆48Updated 6 months ago
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆120Updated 5 months ago