borjanG / 2023-transformersLinks
Codes for the paper The emergence of clusters in self-attention dynamics.
☆16Updated last year
Alternatives and similar repositories for 2023-transformers
Users that are interested in 2023-transformers are comparing it to the libraries listed below
Sorting:
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated last year
- ☆15Updated 3 years ago
- ☆53Updated 9 months ago
- PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆37Updated 3 years ago
- ☆13Updated 3 years ago
- General Invertible Transformations for Flow-based Generative Models☆18Updated 4 years ago
- Investigate the speed of adaptation of structural causal models☆15Updated 4 years ago
- Understanding how features learned by neural networks evolve throughout training☆36Updated 8 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- Replicating and dissecting the git-re-basin project in one-click-replication Colabs☆36Updated 2 years ago
- ☆26Updated 2 years ago
- ☆18Updated 2 years ago
- The Energy Transformer block, in JAX☆59Updated last year
- ☆23Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆84Updated last year
- 📰 Computing the information content of trained neural networks☆21Updated 3 years ago
- ☆32Updated 9 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆124Updated last year
- Official repository for the paper "Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules" (…☆23Updated last month
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- Evaluation of neuro-symbolic engines☆38Updated 11 months ago
- Blog post☆17Updated last year
- Omnigrok: Grokking Beyond Algorithmic Data☆58Updated 2 years ago
- ☆12Updated 3 years ago
- A centralized place for deep thinking code and experiments☆85Updated last year
- gzip Predicts Data-dependent Scaling Laws☆35Updated last year
- ☆37Updated 3 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated last year
- Laplace Redux -- Effortless Bayesian Deep Learning☆42Updated last month
- Official Implementation of the ICML 2023 paper: "Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally …☆72Updated 2 years ago