JacksonWuxs / UsableXAI_LLM
Using Explanations as a Tool for Advanced LLMs
☆60Updated 7 months ago
Alternatives and similar repositories for UsableXAI_LLM:
Users that are interested in UsableXAI_LLM are comparing it to the libraries listed below
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆47Updated 7 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆35Updated 2 months ago
- [FCS'24] LVLM Safety paper☆17Updated 3 months ago
- ☆42Updated 2 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆45Updated last month
- ☆28Updated last month
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Models, data, and codes for the paper: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models☆18Updated 6 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆83Updated 2 months ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆30Updated 5 months ago
- ☆28Updated 5 months ago
- ☆20Updated last week
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 5 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆75Updated 4 months ago
- Code for paper: Are Large Language Models Post Hoc Explainers?☆31Updated 9 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆108Updated last year
- A curated list of resources for activation engineering☆63Updated 2 weeks ago
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 3 months ago
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆39Updated last week
- Understanding Why and How Instruction Tuning Changes Pre-trained Models☆22Updated last year
- ☆34Updated 6 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆53Updated 4 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆49Updated 5 months ago
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆33Updated last year
- ☆25Updated 11 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆141Updated 2 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆112Updated 7 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 9 months ago
- AbstainQA, ACL 2024☆25Updated 6 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆43Updated 6 months ago