JacksonWuxs / UsableXAI_LLM
Using Explanations as a Tool for Advanced LLMs
☆60Updated 8 months ago
Alternatives and similar repositories for UsableXAI_LLM
Users that are interested in UsableXAI_LLM are comparing it to the libraries listed below
Sorting:
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆42Updated last month
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆35Updated 4 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆35Updated 3 months ago
- ☆43Updated 3 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 6 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 4 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆49Updated 8 months ago
- Code for "Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models", ICLR 2024 Oral.☆21Updated last year
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆32Updated 6 months ago
- ☆31Updated 2 months ago
- Models, data, and codes for the paper: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models☆18Updated 7 months ago
- ☆137Updated last month
- ☆46Updated last week
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 2 months ago
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆41Updated 3 months ago
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Updated 11 months ago
- Code for paper: Are Large Language Models Post Hoc Explainers?☆31Updated 9 months ago
- A curated list of resources for activation engineering☆74Updated last week
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆93Updated 3 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆48Updated 5 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆56Updated 2 months ago
- ☆36Updated 7 months ago
- Code for paper "Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators"☆12Updated 5 months ago
- Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization☆14Updated 5 months ago
- AbstainQA, ACL 2024☆25Updated 7 months ago
- ☆23Updated 7 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆18Updated 3 months ago
- [FCS'24] LVLM Safety paper☆17Updated 4 months ago