alignedai / HappyFacesLinks

The Happy Faces Benchmark

☆15

Alternatives and similar repositories for HappyFaces

Users that are interested in HappyFaces are comparing it to the libraries listed below

Sorting:

TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆198Updated 3 years ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆116Updated 5 months ago
redwoodresearch / rust_circuit_public
☆65Updated 2 years ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆81Updated 3 years ago
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 11 months ago
google-deepmind / neural_networks_chomsky_hierarchy
Neural Networks and the Chomsky Hierarchy
☆211Updated last year
redwoodresearch / remix_public
☆20Updated 2 years ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
tech-srl / RASP
An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"
☆322Updated last year
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated last week
quantified-uncertainty / ai-safety-papers
☆22Updated 4 years ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
callummcdougall / sae_visualizer
☆29Updated last year
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆75Updated 2 years ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆232Updated 3 months ago
EleutherAI / project-menu
See the issue board for the current status of active and prospective projects!
☆65Updated 3 years ago
Mech-Interp / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆14Updated last year
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆129Updated this week
Kiv / fancy_einsum
Einsum with einops style variable names
☆18Updated last year
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
ArthurConmy / Automatic-Circuit-Discovery
☆255Updated last year
goodfire-ai / spd
Stochastic Parameter Decomposition
☆52Updated this week
probcomp / LLaMPPL
A domain-specific probabilistic programming language for modeling and inference with language models
☆137Updated 7 months ago
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆20Updated 10 months ago
mechanistic-interpretability-grokking / progress-measures-paper
☆70Updated 3 years ago