hijohnnylin / neuronpediaLinks
open source interpretability platform 🧠
☆293Updated this week
Alternatives and similar repositories for neuronpedia
Users that are interested in neuronpedia are comparing it to the libraries listed below
Sorting:
- Open source interpretability artefacts for R1.☆154Updated 2 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆193Updated 8 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆243Updated last month
- ☆134Updated 3 months ago
- ☆171Updated 4 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆193Updated this week
- Reproducible, flexible LLM evaluations☆222Updated last week
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html☆75Updated 3 months ago
- Automatic evals for LLMs☆467Updated 3 weeks ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆224Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆206Updated 7 months ago
- A simple unified framework for evaluating LLMs☆225Updated 3 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆210Updated last week
- Scaling Data for SWE-agents☆293Updated last week
- Sparsify transformers with SAEs and transcoders☆584Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 6 months ago
- Collection of evals for Inspect AI☆178Updated this week
- Improving Alignment and Robustness with Circuit Breakers☆220Updated 9 months ago
- PyTorch building blocks for the OLMo ecosystem☆261Updated this week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆502Updated 2 months ago
- A benchmark for LLMs on complicated tasks in the terminal☆240Updated this week
- Mechanistic Interpretability Visualizations using React☆262Updated 7 months ago
- code for training & evaluating Contextual Document Embedding models☆194Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago
- ☆129Updated 4 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆115Updated 4 months ago
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 8 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆250Updated 9 months ago
- Code for the paper "Fishing for Magikarp"☆159Updated 2 months ago
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]☆259Updated 4 months ago