ejnnr/cupbearer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ejnnr/cupbearer)

ejnnr / cupbearer

A library for mechanistic anomaly detection

☆22

Alternatives and similar repositories for cupbearer

Users that are interested in cupbearer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EleutherAI / elk-generalization
View on GitHub
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆33May 23, 2024Updated 2 years ago
TeunvdWeij / sandbagging
View on GitHub
☆21Nov 15, 2024Updated last year
redwoodresearch / rust_circuit_public
View on GitHub
☆67Feb 16, 2023Updated 3 years ago
FlyingPumba / InterpBench
View on GitHub
A benchmark for mechanistic discovery of circuits in Transformers
☆17Dec 15, 2024Updated last year
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
JasonGross / guarantees-based-mechanistic-interpretability
View on GitHub
☆18Jul 21, 2026Updated last week
neulab / ToM-Language-Acquisition
View on GitHub
Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".
☆15Apr 27, 2023Updated 3 years ago
kxcloud / gradient-routing
View on GitHub
☆11Dec 4, 2024Updated last year
thestephencasper / latent_adversarial_training
View on GitHub
☆24Jul 25, 2024Updated 2 years ago
UFO-101 / auto-circuit
View on GitHub
A library for efficient patching and automatic circuit discovery.
☆99Dec 31, 2025Updated 6 months ago
alan-cooney / transformer-from-scratch
View on GitHub
Decoder only transformer, built from scratch with PyTorch
☆33Oct 22, 2023Updated 2 years ago
simple-stories / simple_stories_train
View on GitHub
Trains small LMs. Designed for training on SimpleStories
☆14Sep 15, 2025Updated 10 months ago
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
jcmgray / einsum_bmm
View on GitHub
einsum via batch matrix multiply
☆15Nov 29, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
muchanem / hierarchical-sparse-autoencoders
View on GitHub
☆16Jun 3, 2025Updated last year
lingo-mit / lm-truthfulness
View on GitHub
☆17Dec 21, 2023Updated 2 years ago
alan-cooney / transformer-lens-starter-template
View on GitHub
A quick way to get started with Transformer Lens
☆14Dec 13, 2023Updated 2 years ago
Blkalkin / Optimal-TestTime
View on GitHub
☆10Mar 24, 2025Updated last year
AHartNtkn / Dependent-Binary-Lambda-Calculus
View on GitHub
A Dependently Typed Esolang
☆10Aug 4, 2017Updated 8 years ago
OliverEvans96 / maturin-nix-example
View on GitHub
☆15Jul 21, 2023Updated 3 years ago
TigerHix / AlterEcho
View on GitHub
☆12Aug 29, 2021Updated 4 years ago
avivga / style-image-prior
View on GitHub
Official Implementation of "Style Generator Inversion for Image Enhancement and Animation".
☆13Dec 2, 2021Updated 4 years ago
thestephencasper / benchmarking_interpretability
View on GitHub
☆35Sep 13, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DavisPL / PCCC
View on GitHub
Proof-carrying code completions in Dafny
☆11Apr 4, 2025Updated last year
jannik-brinkmann / hugginglens
View on GitHub
TransformerLens + HuggingFace
☆11Nov 4, 2023Updated 2 years ago
chasenorman / Formalized-Voting
View on GitHub
☆13Jul 24, 2021Updated 5 years ago
koayon / atp_star
View on GitHub
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Jan 19, 2025Updated last year
bloominstituteoftechnology / ML-day_2
View on GitHub
Day 2 of Lambda School's Machine Learning Mini Bootcamp
☆10Aug 31, 2018Updated 7 years ago
betaboon / mobile-nixos-flake
View on GitHub
☆12Mar 23, 2024Updated 2 years ago
harvey-fin / absence-bench
View on GitHub
Code implementation for paper AbsenceBench: Language Models Can't Tell What's Missing
☆19Oct 23, 2025Updated 9 months ago
ApolloResearch / sample
View on GitHub
Repository with sample code using Apollo's suggested engineering practices
☆15Dec 16, 2024Updated last year
monasticacademy / logical-induction
View on GitHub
Code to support the guide to logical induction for software engineers
☆11Jul 12, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
RobertCsordas / onion_representations
View on GitHub
☆13Aug 19, 2024Updated last year
Kha / nale
View on GitHub
Nix + Lean = Nale
☆12Jul 16, 2023Updated 3 years ago
EleutherAI / training-jacobian
View on GitHub
☆24Dec 11, 2024Updated last year
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆20Oct 11, 2024Updated last year
EleutherAI / steering-llama3
View on GitHub
☆30Aug 2, 2024Updated last year
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
uemurax / morg
View on GitHub
Organize mathematical thoughts
☆20Oct 6, 2023Updated 2 years ago