Ybakman / LLM_UncertaintyLinks
☆11Updated 11 months ago
Alternatives and similar repositories for LLM_Uncertainty
Users that are interested in LLM_Uncertainty are comparing it to the libraries listed below
Sorting:
- ☆172Updated last year
- `dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.☆84Updated 2 months ago
- ☆99Updated last year
- ☆163Updated 9 months ago
- AI Logging for Interpretability and Explainability🔬☆124Updated last year
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆160Updated 2 months ago
- A fast, effective data attribution method for neural networks in PyTorch☆217Updated 9 months ago
- ☆55Updated 2 years ago
- ☆226Updated last year
- LoFiT: Localized Fine-tuning on LLM Representations☆40Updated 7 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆125Updated 2 months ago
- Codes for papers on Large Language Models Personalization (LaMP)☆167Updated 6 months ago
- Conformal Language Modeling☆32Updated last year
- ☆51Updated last year
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆87Updated last year
- ☆96Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆53Updated 11 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆79Updated 8 months ago
- SafeArena is a benchmark for assessing the harmful capabilities of web agents☆17Updated 4 months ago
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆37Updated last year
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆131Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆63Updated 9 months ago
- ☆293Updated last year
- How do transformer LMs encode relations?☆52Updated last year
- ☆47Updated last month
- This repository collects all relevant resources about interpretability in LLMs☆370Updated 10 months ago
- ☆165Updated 9 months ago
- [ICLR 2025] General-purpose activation steering library☆99Updated last week
- Steering Llama 2 with Contrastive Activation Addition☆178Updated last year
- A Survey of Attributions for Large Language Models☆211Updated last year