cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
โ197Updated 3 months ago
Alternatives and similar repositories for Awesome-LLM-Interpretability:
Users that are interested in Awesome-LLM-Interpretability are comparing it to the libraries listed below
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ145Updated 8 months ago
- A Survey on Data Selection for Language Modelsโ201Updated 3 months ago
- LLM hallucination paper listโ299Updated 10 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)โ102Updated 9 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretabilityโ26Updated this week
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.โ82Updated this week
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learningโ160Updated 11 months ago
- โ104Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformersโ120Updated last month
- LLM Unlearningโ141Updated last year
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factualityโ170Updated 5 months ago
- A resource repository for representation engineering in large language modelsโ90Updated 2 months ago
- This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papersโฆโ69Updated last month
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"โ452Updated 8 months ago
- Collection of Reverse Engineering in Large Modelโ31Updated last week
- The Paper List on Data Contamination for Large Language Models Evaluation.โ86Updated last week
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Ajiโฆโ215Updated last year
- Must-read Papers on Large Language Model (LLM) Continual Learningโ141Updated last year
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuningโ400Updated 2 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Modelsโ85Updated 5 months ago
- โ152Updated 6 months ago
- โ184Updated 10 months ago
- A survey on harmful fine-tuning attack for large language modelโ124Updated this week
- โ164Updated last week
- The repo for In-context Autoencoderโ101Updated 8 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modificationsโ65Updated 3 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.โ75Updated 8 months ago
- RewardBench: the first evaluation tool for reward models.โ491Updated last week
- โ37Updated 7 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.โ286Updated this week