mega002 / llm-interp-tauLinks
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆110Updated this week
Alternatives and similar repositories for llm-interp-tau
Users that are interested in llm-interp-tau are comparing it to the libraries listed below
Sorting:
- Open source interpretability artefacts for R1.☆163Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- ☆82Updated this week
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Updated 6 months ago
- PyTorch library for Active Fine-Tuning☆95Updated last month
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆147Updated last month
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆59Updated last month
- Open source replication of Anthropic's Crosscoders for Model Diffing☆60Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆50Updated 11 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆225Updated last week
- Sparse Autoencoder Training Library☆55Updated 6 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆71Updated last year
- nanoGPT-like codebase for LLM training☆109Updated last week
- ☆143Updated 2 months ago
- Sparse and discrete interpretability tool for neural networks☆64Updated last year
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆20Updated 3 weeks ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆141Updated 4 months ago
- Extract full next-token probabilities via language model APIs☆247Updated last year
- Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).☆91Updated 3 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆92Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Updated 11 months ago
- ☆69Updated last year
- Universal Neurons in GPT2 Language Models☆31Updated last year
- ☆117Updated last month
- we got you bro☆36Updated last year
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆29Updated last month
- ☆23Updated 9 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated last year
- code for training & evaluating Contextual Document Embedding models☆200Updated 6 months ago
- Understanding how features learned by neural networks evolve throughout training☆39Updated last year