https://transformer-circuits.pub/2025/attribution-graphs/methods.html
☆93Mar 27, 2025Updated 11 months ago
Alternatives and similar repositories for attribution-graphs-frontend
Users that are interested in attribution-graphs-frontend are comparing it to the libraries listed below
Sorting:
- ☆200Nov 17, 2024Updated last year
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆14Jan 12, 2026Updated last month
- A library for training crosscoders☆16May 28, 2025Updated 9 months ago
- ☆21Mar 2, 2026Updated last week
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- Experiments with representation engineering☆14Feb 28, 2024Updated 2 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy☆40Feb 8, 2026Updated last month
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆208Dec 22, 2021Updated 4 years ago
- ☆26Sep 5, 2024Updated last year
- Training Sparse Autoencoders on Language Models☆1,245Feb 27, 2026Updated last week
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆117Oct 27, 2025Updated 4 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Feb 16, 2026Updated 3 weeks ago
- ☆2,628Feb 28, 2026Updated last week
- Codes for "Efficient Offline Policy Optimization with a Learned Model", ICLR2023☆30Jul 18, 2023Updated 2 years ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆134Mar 9, 2024Updated 2 years ago
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- Benchmarks for the Evaluation of LLM Supervision☆33Jan 19, 2026Updated last month
- ☆29Apr 4, 2024Updated last year
- A dataset of alignment research and code to reproduce it☆78Jun 22, 2023Updated 2 years ago
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆29Feb 18, 2026Updated 2 weeks ago
- [ICRA 2026] StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes☆20Feb 17, 2026Updated 2 weeks ago
- Archive of questions from the Cambridge Mathematics Tripos☆10Jun 6, 2022Updated 3 years ago
- Official implementation for the paper "Can Large Reasoning Models Self-Train?"☆72Oct 10, 2025Updated 5 months ago
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆11Oct 23, 2023Updated 2 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- ☆399Aug 21, 2025Updated 6 months ago
- Sparsify transformers with SAEs and transcoders☆699Mar 2, 2026Updated last week
- PaddleAPEX:Paddle Accuracy and Performance EXpansion pack☆9Dec 12, 2024Updated last year
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- (NeurIPS 2025) LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search☆21Feb 3, 2026Updated last month
- Stop creating folders, start creating structures!☆10Jul 8, 2021Updated 4 years ago
- Community maintained hardware plugin for vLLM on AWS Neuron☆24Feb 26, 2026Updated last week
- Trains small LMs. Designed for training on SimpleStories☆12Sep 15, 2025Updated 5 months ago
- DragMesh: Interactive 3D Generation Made Easy☆20Dec 28, 2025Updated 2 months ago
- ☆17Aug 5, 2025Updated 7 months ago
- Tusk Drift Demo - Node.js Service☆58Jan 20, 2026Updated last month