koo-ec / Awesome-LLM-Explainability
A curated list of explainability-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the explainability implications, challenges, and advancements surrounding these powerful models.
☆20Updated 2 weeks ago
Alternatives and similar repositories for Awesome-LLM-Explainability:
Users that are interested in Awesome-LLM-Explainability are comparing it to the libraries listed below
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆92Updated 3 weeks ago
- Code for paper: Are Large Language Models Post Hoc Explainers?☆30Updated 6 months ago
- [Arxiv 2024] Dissecting Adversarial Robustness of Multimodal LM Agents☆60Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆98Updated 4 months ago
- ☆125Updated last year
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆44Updated 2 months ago
- ☆110Updated 3 weeks ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆126Updated this week
- [EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156☆29Updated last year
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆32Updated 3 months ago
- Using Explanations as a Tool for Advanced LLMs☆58Updated 5 months ago
- Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…☆96Updated 3 months ago
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆34Updated 2 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 10 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆125Updated 2 months ago
- ☆17Updated 4 months ago
- ☆163Updated last year
- A resource repository for representation engineering in large language models☆102Updated 3 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆41Updated last week
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆140Updated 9 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆77Updated 9 months ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆162Updated last week
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆59Updated 3 months ago
- Weak-to-Strong Jailbreaking on Large Language Models☆72Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆91Updated last month
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆58Updated 3 weeks ago
- ☆112Updated 5 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆27Updated 11 months ago
- ☆41Updated 2 weeks ago