safety-research / circuit-tracerView external linksLinks
☆2,595Feb 6, 2026Updated last week
Alternatives and similar repositories for circuit-tracer
Users that are interested in circuit-tracer are comparing it to the libraries listed below
Sorting:
- open source interpretability platform 🧠☆704Updated this week
- A library for mechanistic interpretability of GPT-style language models☆3,073Updated this week
- Training Sparse Autoencoders on Language Models☆1,201Updated this week
- Sparsify transformers with SAEs and transcoders☆692Feb 9, 2026Updated last week
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 3 months ago
- ☆198Nov 17, 2024Updated last year
- DSPy: The framework for programming—not prompting—language models☆32,156Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆811Updated this week
- Open source interpretability artefacts for R1.☆170Apr 21, 2025Updated 9 months ago
- Attribution-based Parameter Decomposition☆33Jun 11, 2025Updated 8 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆241Feb 9, 2026Updated last week
- ☆146Dec 30, 2025Updated last month
- Tools for merging pretrained large language models.☆6,783Jan 26, 2026Updated 3 weeks ago
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html☆91Mar 27, 2025Updated 10 months ago
- ☆394Aug 21, 2025Updated 5 months ago
- ☆570Jul 19, 2024Updated last year
- ☆4,109Jun 4, 2024Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆342Jun 13, 2025Updated 8 months ago
- Minimal reproduction of DeepSeek R1-Zero☆12,748Apr 24, 2025Updated 9 months ago
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆20,950Mar 11, 2025Updated 11 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,547Updated this week
- Our library for RL environments + evals☆3,833Updated this week
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆243Dec 16, 2024Updated last year
- ☆207Oct 14, 2025Updated 4 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,555Jan 14, 2026Updated last month
- ☆88Dec 18, 2025Updated last month
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.☆51,922Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆70,205Updated this week
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- verl: Volcano Engine Reinforcement Learning for LLMs☆19,132Updated this week
- Mechanistic Interpretability Visualizations using React☆320Dec 18, 2024Updated last year
- The LLM Evaluation Framework☆13,613Feb 10, 2026Updated last week
- ☆51Jan 28, 2026Updated 2 weeks ago
- Universal memory layer for AI Agents☆47,230Feb 3, 2026Updated 2 weeks ago
- LlamaIndex is the leading framework for building LLM-powered agents over your data.☆46,977Updated this week
- Train transformer language models with reinforcement learning.☆17,360Updated this week
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,754Jul 18, 2025Updated 6 months ago
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated last month
- Fully open reproduction of DeepSeek-R1☆25,879Nov 24, 2025Updated 2 months ago