mega002 / llm-interp-tauLinks

Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University

☆110

Alternatives and similar repositories for llm-interp-tau

Users that are interested in llm-interp-tau are comparing it to the libraries listed below

Sorting:

goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 6 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
allenai / infinigram-api
☆82Updated this week
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Updated 6 months ago
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆95Updated last month
apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆147Updated last month
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆59Updated last month
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆60Updated last year
aryamanarora / causalgym
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆50Updated 11 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆225Updated last week
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 6 months ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆109Updated last week
google-deepmind / mishax
☆143Updated 2 months ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
KoyenaPal / future-lens
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆20Updated 3 weeks ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆141Updated 4 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
tommasomncttn / mergenetic
Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).
☆91Updated 3 months ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆92Updated last year
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆84Updated 11 months ago
KaiNylund / lm-weights-encode-time
☆69Updated last year
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆31Updated last year
METR / RE-Bench
☆117Updated last month
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆29Updated last month
adamkarvonen / SAE_BoardGameEval
☆23Updated 9 months ago
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆200Updated 6 months ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆39Updated last year