AI4LIFE-GROUP / LLM_Explainer
Code for paper: Are Large Language Models Post Hoc Explainers?
☆30Updated 8 months ago
Alternatives and similar repositories for LLM_Explainer:
Users that are interested in LLM_Explainer are comparing it to the libraries listed below
- A repository for summaries of recent explainable AI/Interpretable ML approaches☆73Updated 5 months ago
- ☆23Updated last year
- Influence Analysis and Estimation - Survey, Papers, and Taxonomy☆72Updated last year
- Using Explanations as a Tool for Advanced LLMs☆60Updated 6 months ago
- Conformal Language Modeling☆28Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 5 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆30Updated last year
- ☆16Updated 2 weeks ago
- ☆84Updated 8 months ago
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆123Updated 4 months ago
- ☆42Updated last month
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆93Updated last month
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 2 months ago
- ☆17Updated last year
- ☆37Updated last year
- ☆28Updated last year
- OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)☆96Updated last month
- ☆53Updated 2 years ago
- ☆47Updated last year
- A resource repository for representation engineering in large language models☆115Updated 4 months ago
- ☆127Updated last year
- A simple PyTorch implementation of influence functions.☆85Updated 9 months ago
- ☆22Updated 4 months ago
- Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.☆41Updated 2 weeks ago
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆17Updated 3 months ago
- Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).☆14Updated 3 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆74Updated 3 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- Code for NeurIPS'23 paper "A Bayesian Approach To Analysing Training Data Attribution In Deep Learning"☆15Updated last year
- AI Logging for Interpretability and Explainability🔬☆108Updated 9 months ago