borjanG / 2023-transformers
Codes for the paper The emergence of clusters in self-attention dynamics.
☆15Updated last year
Alternatives and similar repositories for 2023-transformers:
Users that are interested in 2023-transformers are comparing it to the libraries listed below
- Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561☆24Updated 4 years ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks☆10Updated 9 months ago
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- ☆24Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆60Updated last year
- ☆10Updated 6 months ago
- Source code of "What can linearized neural networks actually say about generalization?☆20Updated 3 years ago
- Omnigrok: Grokking Beyond Algorithmic Data☆53Updated 2 years ago
- Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"☆106Updated 4 years ago
- ☆53Updated 8 months ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆41Updated last year
- ☆52Updated 5 months ago
- Deep Learning & Information Bottleneck☆58Updated last year
- ☆34Updated 2 years ago
- Refining continuous-in-depth neural networks☆39Updated 3 years ago
- Investigate the speed of adaptation of structural causal models☆16Updated 4 years ago
- ☆9Updated last year
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆35Updated last year
- ☆30Updated 5 months ago
- Repo to the paper "Lie Point Symmetry Data Augmentation for Neural PDE Solvers"☆49Updated last year
- PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆36Updated 3 years ago
- ☆65Updated 3 months ago
- Laplace Redux -- Effortless Bayesian Deep Learning☆43Updated last year
- Experiments from the paper "On Second Order Behaviour in Augmented Neural ODEs"☆58Updated 6 months ago
- ☆13Updated 2 years ago
- Monotone operator equilibrium networks☆51Updated 4 years ago
- An annotated implementation of the Hyena Hierarchy paper☆32Updated last year
- ☆29Updated last year
- ☆66Updated 6 years ago