redwoodresearch/rust_circuit_public

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/redwoodresearch/rust_circuit_public)

redwoodresearch / rust_circuit_public

☆67

Alternatives and similar repositories for rust_circuit_public

Users that are interested in rust_circuit_public are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
View on GitHub
☆293Oct 1, 2024Updated last year
LRudL / evalugator
View on GitHub
(Model-written) LLM evals library
☆19Dec 13, 2024Updated last year
Mech-Interp / PySvelte
View on GitHub
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆17Apr 15, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
redwoodresearch / remix_public
View on GitHub
☆20Feb 17, 2023Updated 3 years ago
callummcdougall / ARENA_2.0
View on GitHub
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆246Aug 11, 2025Updated 11 months ago
redwoodresearch / Easy-Transformer
View on GitHub
☆148Aug 4, 2024Updated last year
redwoodresearch / mlab
View on GitHub
Machine Learning for Alignment Bootcamp
☆84Apr 27, 2022Updated 4 years ago
anthropics / toy-models-of-superposition
View on GitHub
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆156Sep 14, 2022Updated 3 years ago
quantified-uncertainty / ai-safety-papers
View on GitHub
☆22Sep 9, 2021Updated 4 years ago
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Updated this week
mishajw / repeng
View on GitHub
Experiments with representation engineering
☆14Feb 28, 2024Updated 2 years ago
danielway / nexrad-volumetric-renderer
View on GitHub
Project exploring 3D volumetric rendering of NEXRAD radar data.
☆13Oct 23, 2023Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
alignedai / HappyFaces
View on GitHub
The Happy Faces Benchmark
☆15Jul 20, 2023Updated 3 years ago
quantified-uncertainty / squiggle
View on GitHub
This monorepo covers multiple QURI projects, including Squiggle language, Squiggle Hub and Metaforecast
☆216Updated this week
dolphingarlic / st0nks
View on GitHub
Real News Headlines + Fake Financial Predictions = St0nks
☆24May 22, 2023Updated 3 years ago
bilal-chughtai / rep-theory-mech-interp
View on GitHub
☆31May 4, 2023Updated 3 years ago
vedantpalit / Towards-Vision-Language-Mechanistic-Interpretability
View on GitHub
This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…
☆25Feb 16, 2026Updated 5 months ago
davidbau / baukit
View on GitHub
☆256Feb 22, 2024Updated 2 years ago
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆14Feb 13, 2023Updated 3 years ago
hannamw / gpt2-greater-than
View on GitHub
Code Release for the 2023 NeurIPS Paper How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained langua…
☆17Dec 6, 2024Updated last year
UlisseMini / ana
View on GitHub
The AI that helps you achieve your goals
☆11Feb 4, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
TransformerLensOrg / CircuitsVis
View on GitHub
Mechanistic Interpretability Visualizations using React
☆358Apr 30, 2026Updated 2 months ago
Chillee / lit-llama
View on GitHub
Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code
☆10Aug 29, 2023Updated 2 years ago
ApolloResearch / apd
View on GitHub
Attribution-based Parameter Decomposition
☆35Jun 11, 2025Updated last year
TransluceAI / docent
View on GitHub
☆114Updated this week
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
tripos-education / maths-tripos-questions
View on GitHub
Archive of questions from the Cambridge Mathematics Tripos
☆10Jun 6, 2022Updated 4 years ago
eburghar / l3charts
View on GitHub
Customizable charts made with TikZ and LaTeX3
☆14Feb 11, 2023Updated 3 years ago
understanding-search / structured-representations-maze-transformers
View on GitHub
see github.com/understanding-search/maze-transformer
☆10Dec 8, 2023Updated 2 years ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
neelnanda-io / 1L-Sparse-Autoencoder
View on GitHub
☆141Oct 28, 2023Updated 2 years ago
qrdl / flightrec
View on GitHub
Flight Recorder allows to record client program execution and examine it later
☆11Sep 18, 2020Updated 5 years ago
TransformerLensOrg / TransformerLens
View on GitHub
A library for mechanistic interpretability of GPT-style language models
☆3,695Updated this week
callummcdougall / sae_vis
View on GitHub
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆265Feb 27, 2026Updated 4 months ago
MANGA-UOFA / PTfer
View on GitHub
☆11Nov 13, 2024Updated last year
hkust-nlp / Activation_Decoding
View on GitHub
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆63Mar 30, 2024Updated 2 years ago