☆159Jan 15, 2024Updated 2 years ago
Alternatives and similar repositories for Explainability-for-Large-Language-Models
Users that are interested in Explainability-for-Large-Language-Models are comparing it to the libraries listed below
Sorting:
- Using Explanations as a Tool for Advanced LLMs☆69Sep 11, 2024Updated last year
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆39Jul 18, 2025Updated 7 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆294Jan 22, 2026Updated last month
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆66Apr 11, 2025Updated 10 months ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆30Oct 12, 2025Updated 4 months ago
- ☆105Jun 30, 2024Updated last year
- A lightweight server for evaluating Texas Hold'em agents.☆14Mar 15, 2025Updated 11 months ago
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆17Feb 27, 2025Updated last year
- ☆17Mar 3, 2025Updated last year
- Multilingual Neural Machine Translation using Transformers with Conditional Normalization.☆18Mar 24, 2023Updated 2 years ago
- Repo for "AlphaResearch: Accelerating New Algorithm Discovery with Language Models"☆54Nov 12, 2025Updated 3 months ago
- Group-conditional DRO to alleviate spurious correlations☆15Jul 15, 2021Updated 4 years ago
- 北京大学2023-2024学年ICS计算机系统导论个人资料整理☆22Mar 8, 2024Updated last year
- Uncertainty-aware classification.☆17Jun 28, 2022Updated 3 years ago
- Code repository for the paper "Invariant and Transportable Representations for Anti-Causal Domain Shifts"☆16Jul 4, 2022Updated 3 years ago
- This is a code example repo for the NLP course offered by the Institute of Chinese Information Processing of BNU.☆51May 2, 2025Updated 10 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆57Oct 30, 2025Updated 4 months ago
- Repository of paper "How Likely Do LLMs with CoT Mimic Human Reasoning?"☆23Feb 19, 2025Updated last year
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆25Dec 1, 2024Updated last year
- [ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…☆24Jun 1, 2024Updated last year
- Code and data for the FACTOR paper☆53Nov 15, 2023Updated 2 years ago
- A curated list of awesome 3D generation resources, papers, and tools for AI-powered 3D content creation.☆56Updated this week
- ☆24Aug 23, 2025Updated 6 months ago
- This repository collects all relevant resources about interpretability in LLMs☆390Nov 1, 2024Updated last year
- A curated list of Large Language Model (LLM) Interpretability resources.☆1,475Feb 24, 2026Updated last week
- ☆20Jan 16, 2024Updated 2 years ago
- Uncertainty Estimation Using Deep Neural Network and Gradient Boosting Methods☆22Jun 1, 2021Updated 4 years ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.☆195Updated this week
- A Dungeons & Dragons multiplayer game developed by ChatGPT☆25Aug 14, 2023Updated 2 years ago
- ☆23Nov 15, 2022Updated 3 years ago
- ☆23Jun 15, 2022Updated 3 years ago
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Jun 4, 2024Updated last year
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- An awesome repository & A comprehensive survey on interpretability of LLM attention heads.☆398Mar 2, 2025Updated last year
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆176Oct 27, 2023Updated 2 years ago
- ☆28Nov 16, 2025Updated 3 months ago