Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆305Feb 8, 2026Updated 3 weeks ago
Alternatives and similar repositories for llm-interp-tau
Users that are interested in llm-interp-tau are comparing it to the libraries listed below
Sorting:
- Implementations of several self-supervised pretext tasks for language and vision modalities in PyTorch.☆13Jan 19, 2021Updated 5 years ago
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆32Mar 2, 2025Updated last year
- Engine for collecting, uploading, and downloading model activations☆26Apr 2, 2025Updated 11 months ago
- How do transformer LMs encode relations?☆56Feb 24, 2024Updated 2 years ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆83Feb 25, 2025Updated last year
- Code for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"☆32Mar 31, 2025Updated 11 months ago
- Training Sparse Autoencoders on Language Models☆1,233Feb 27, 2026Updated last week
- ☆28Nov 16, 2025Updated 3 months ago
- ⚓️ Interactive playground for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆18Dec 20, 2025Updated 2 months ago
- ☆16Feb 24, 2026Updated last week
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- Tools for optimizing steering vectors in LLMs.☆20Apr 10, 2025Updated 10 months ago
- ☆17Aug 30, 2025Updated 6 months ago
- Notes and slides for a course on social and scientific aspects of machine learning☆10Jan 26, 2026Updated last month
- ☆17Jul 9, 2025Updated 7 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆117Oct 27, 2025Updated 4 months ago
- ☆10Oct 22, 2024Updated last year
- ☆14Oct 30, 2024Updated last year
- ☆70Mar 6, 2025Updated last year
- RAG based chatbot for Global AI Hub☆26Oct 4, 2025Updated 5 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆230Dec 12, 2025Updated 2 months ago
- Code for steering and monitoring with concepts vectors in LLMs. https://arxiv.org/abs/2502.03708☆21Aug 10, 2025Updated 6 months ago
- A Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation, Levy et al., Findings of EMNLP 2021☆14Apr 3, 2022Updated 3 years ago
- PAIR.withgoogle.com and friend's work on interpretability methods☆222Feb 4, 2026Updated last month
- Sparsify transformers with SAEs and transcoders☆699Updated this week
- ☆19Sep 4, 2024Updated last year
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆168Feb 22, 2026Updated last week
- [ICML 2025] Unlearning in Diffusion Models using Sparse Autoencoders☆52Oct 16, 2025Updated 4 months ago
- graphpatch is a library for activation patching on PyTorch neural network models.☆21Feb 11, 2025Updated last year
- ☆17Sep 3, 2024Updated last year
- A library for mechanistic interpretability of GPT-style language models☆3,133Updated this week
- ☆25Feb 20, 2026Updated 2 weeks ago
- Measuring if attention is explanation with ROAR☆22Mar 3, 2023Updated 3 years ago
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆24Aug 15, 2025Updated 6 months ago
- Collaborative IoT framework in Swift targeted at iOS, iPadOS, and MacOS applications.☆18Apr 12, 2023Updated 2 years ago