hijohnnylin / neuronpediaLinks
open source interpretability platform 🧠
☆276Updated this week
Alternatives and similar repositories for neuronpedia
Users that are interested in neuronpedia are comparing it to the libraries listed below
Sorting:
- Open source interpretability artefacts for R1.☆149Updated 2 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆233Updated 2 weeks ago
- Sparsify transformers with SAEs and transcoders☆576Updated this week
- ☆134Updated 2 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆190Updated 7 months ago
- ☆156Updated 3 months ago
- Improving Alignment and Robustness with Circuit Breakers☆214Updated 9 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆187Updated this week
- A simple unified framework for evaluating LLMs☆220Updated 2 months ago
- Collection of evals for Inspect AI☆167Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆96Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆202Updated 6 months ago
- Automatic evals for LLMs☆437Updated 3 weeks ago
- Mechanistic Interpretability Visualizations using React☆258Updated 6 months ago
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html☆72Updated 3 months ago
- Code for the paper "Fishing for Magikarp"☆157Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆184Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆105Updated 2 months ago
- Reproducible, flexible LLM evaluations☆214Updated last month
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆489Updated last month
- Training Sparse Autoencoders on Language Models☆846Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆599Updated this week
- ☆211Updated last week
- ☆136Updated 7 months ago
- PyTorch building blocks for the OLMo ecosystem☆238Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆448Updated 9 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆219Updated 6 months ago
- ☆127Updated 3 months ago
- Scale your LLM-as-a-judge.☆240Updated 3 weeks ago