JacobPfau / procgenAISCLinks

☆19

Alternatives and similar repositories for procgenAISC

Users that are interested in procgenAISC are comparing it to the libraries listed below

Sorting:

redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
jbkjr / train-procgen-pytorch
Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.
☆14Updated last year
jbloomAus / DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
☆87Updated last year
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆51Updated 3 years ago
Sea-Snell / grokking
unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
☆78Updated 3 years ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆127Updated 2 years ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆54Updated 3 months ago
adamkarvonen / SAE_BoardGameEval
☆23Updated 6 months ago
victorvikram / ConceptARC
Materials for ConceptARC paper
☆98Updated 9 months ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆186Updated 2 years ago
callummcdougall / sae_visualizer
☆28Updated last year
koayon / atp_star
PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)
☆18Updated 6 months ago
AllanYangZhou / universal_neural_functional
☆51Updated last year
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆63Updated last year
noanabeshima / matryoshka-saes
☆21Updated 8 months ago
andyljones / boardlaw
Scaling scaling laws with board games.
☆51Updated 2 years ago
understanding-search / maze-transformer
This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.
☆31Updated 11 months ago
aypan17 / machiavelli
☆137Updated 2 weeks ago
ethancaballero / broken_neural_scaling_laws
Code Release for "Broken Neural Scaling Laws" (BNSL) paper
☆59Updated last year
ssokota / mec
Code for minimum-entropy coupling.
☆32Updated last year
bilal-chughtai / rep-theory-mech-interp
☆26Updated 2 years ago
AsaCooperStickland / situational-awareness-evals
Measuring the situational awareness of language models
☆37Updated last year
FLAIROx / cultural-accumulation
☆13Updated last year
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
upiterbarg / lintseq
[ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)
☆19Updated 5 months ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
facebookresearch / motif
Intrinsic Motivation from Artificial Intelligence Feedback
☆130Updated last year
KoyenaPal / future-lens
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆18Updated last year
moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆77Updated 2 years ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated 2 weeks ago