wattenberg / superposition
Code associated to papers on superposition (in ML interpretability)
☆27Updated 2 years ago
Alternatives and similar repositories for superposition:
Users that are interested in superposition are comparing it to the libraries listed below
- ☆26Updated 2 years ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 10 months ago
- The Energy Transformer block, in JAX☆57Updated last year
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆59Updated 3 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆58Updated last year
- ☆28Updated last month
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆62Updated 3 years ago
- ☆53Updated 7 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- gzip Predicts Data-dependent Scaling Laws☆35Updated 11 months ago
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆78Updated 2 years ago
- we got you bro☆35Updated 9 months ago
- ☆27Updated last year
- ☆53Updated last year
- ☆52Updated 11 months ago
- Sparse and discrete interpretability tool for neural networks☆61Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆114Updated 4 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆36Updated last year
- Resources from the EleutherAI Math Reading Group☆53Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆195Updated 2 months ago
- ☆49Updated last year
- Evaluation of neuro-symbolic engines☆35Updated 9 months ago
- Experiments on the impact of depth in transformers and SSMs.☆25Updated 6 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated this week
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- Code for the paper "Function-Space Learning Rates"☆19Updated 3 weeks ago
- Implementing RASP transformer programming language https://arxiv.org/pdf/2106.06981.pdf.☆53Updated 3 years ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆78Updated 10 months ago
- Understanding how features learned by neural networks evolve throughout training☆34Updated 6 months ago