cooperleong00 / Awesome-LLM-InterpretabilityLinks

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

☆262

Alternatives and similar repositories for Awesome-LLM-Interpretability

Users that are interested in Awesome-LLM-Interpretability are comparing it to the libraries listed below

Sorting:

zhenyu-02 / LogitLens4LLMs
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…
☆95Updated 5 months ago
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆245Updated 3 months ago
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆129Updated 8 months ago
zepingyu0512 / awesome-SAE
awesome SAE papers
☆40Updated 2 months ago
OpenMOSS / Language-Model-SAEs
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
☆141Updated this week
LuckyyySTA / Awesome-LLM-hallucination
LLM hallucination paper list
☆320Updated last year
Furyton / awesome-language-model-analysis
This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers…
☆85Updated 8 months ago
EIT-NLP / Awesome-Latent-CoT
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆142Updated 2 weeks ago
wang2226 / Awesome-LLM-Decoding
📜 Paper list on decoding methods for LLMs and LVLMs
☆55Updated last month
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆98Updated 2 weeks ago
DAMO-NLP-SG / multilingual_analysis
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
☆37Updated 8 months ago
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆172Updated last year
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆123Updated 3 weeks ago
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆177Updated 8 months ago
MikaStars39 / FeatureAlignment
FeatureAlignment = Alignment + Mechanistic Interpretability
☆29Updated 4 months ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆504Updated 6 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆114Updated last year
zepingyu0512 / awesome-llm-understanding-mechanism
awesome papers in LLM interpretability
☆527Updated 2 weeks ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆39Updated 8 months ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆239Updated last year
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆99Updated 2 months ago
princeton-nlp / LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆474Updated 9 months ago
Guangxuan-Xiao / GSM8K-eval
☆44Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆175Updated 3 months ago
swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆228Updated last year
Zanette-Labs / efficient-reasoning
☆65Updated 3 months ago
zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆151Updated 5 months ago
Hongcheng-Gao / Awesome-Long2short-on-LRMs
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…
☆241Updated 2 months ago
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆129Updated 10 months ago
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆167Updated last year