HumanCompatibleAI / leela-interpLinks

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"

☆24

Alternatives and similar repositories for leela-interp

Users that are interested in leela-interp are comparing it to the libraries listed below

Sorting:

andyljones / boardlaw
Scaling scaling laws with board games.
☆49Updated 2 years ago
ssokota / mec
Code for minimum-entropy coupling.
☆32Updated last year
google-deepmind / dangerous-capability-evaluations
☆55Updated 9 months ago
bilal-chughtai / rep-theory-mech-interp
☆26Updated 2 years ago
understanding-search / maze-transformer
This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.
☆31Updated 10 months ago
keyonvafa / world-model-evaluation
☆59Updated 8 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated 2 weeks ago
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
adamkarvonen / SAE_BoardGameEval
☆23Updated 5 months ago
lucaslingle / mu_transformer
Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.
☆31Updated last month
Xmaster6y / lczerolens
🔬 Interpretability for Leela Chess Zero networks.
☆15Updated 2 months ago
AllanYangZhou / universal_neural_functional
☆51Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆20Updated last month
davisyoshida / qax
If it quacks like a tensor...
☆58Updated 8 months ago
FLAIROx / cultural-accumulation
☆12Updated last year
smearle / autoverse
Generative cellular automaton-like learning environments for RL.
☆19Updated 5 months ago
callummcdougall / sae-exercises-mats
☆23Updated last year
KhoomeiK / complexity-scaling
gzip Predicts Data-dependent Scaling Laws
☆35Updated last year
facebookresearch / oni
Learn online intrinsic rewards from LLM feedback
☆41Updated 7 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆53Updated 2 months ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆18Updated 5 months ago
nacloos / baba-is-ai
☆37Updated 10 months ago
Sea-Snell / grokking
unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆77Updated 3 years ago
jbloomAus / DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
☆87Updated last year
Ping-C / optimizer
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆37Updated 2 years ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
jbloomAus / SAEDashboard
☆60Updated last week