hijohnnylin / neuronpediaLinks
open source interpretability platform π§
β442Updated this week
Alternatives and similar repositories for neuronpedia
Users that are interested in neuronpedia are comparing it to the libraries listed below
Sorting:
- A toolkit for describing model features and intervening on those features to steer behavior.β204Updated 11 months ago
- Open source interpretability artefacts for R1.β161Updated 5 months ago
- β216Updated 7 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Modelsβ247Updated 2 months ago
- Sparsify transformers with SAEs and transcodersβ631Updated this week
- https://transformer-circuits.pub/2025/attribution-graphs/methods.htmlβ85Updated 6 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".β278Updated 3 months ago
- Training Sparse Autoencoders on Language Modelsβ985Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.β674Updated last week
- β142Updated last month
- Mechanistic Interpretability Visualizations using Reactβ291Updated 9 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).β221Updated 9 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β215Updated last week
- Training-Ready RL Environments + Evalsβ116Updated last week
- Collection of evals for Inspect AIβ241Updated last week
- Testing baseline LLMs performance across various modelsβ311Updated last week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningβ283Updated this week
- βοΈ Awesome LLM Judges βοΈβ130Updated 5 months ago
- [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewardsβ1,168Updated this week
- β54Updated 10 months ago
- Automatic evals for LLMsβ539Updated 3 months ago
- β525Updated last year
- β174Updated 10 months ago
- Post-training with Tinkerβ912Updated this week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'β230Updated 2 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse β¦β702Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.β172Updated 8 months ago
- Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike statβ¦β282Updated 2 weeks ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agentsβ556Updated 2 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agentsβ414Updated last week