borjanG / 2023-transformers
Codes for the paper The emergence of clusters in self-attention dynamics.
☆12Updated 9 months ago
Related projects: ⓘ
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated last year
- Investigate the speed of adaptation of structural causal models☆16Updated 3 years ago
- MDL Complexity computations and experiments from the paper "Revisiting complexity and the bias-variance tradeoff".☆17Updated last year
- ☆57Updated 2 years ago
- Deep Learning & Information Bottleneck☆45Updated last year
- LaTeX source code for the slides☆21Updated 3 years ago
- Laplace Redux -- Effortless Bayesian Deep Learning☆35Updated last year
- Monotone operator equilibrium networks☆51Updated 4 years ago
- Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561☆25Updated 3 years ago
- [NeurIPS 2020] Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters (AHGP)☆20Updated 3 years ago
- Code for the paper: "Tensor Programs II: Neural Tangent Kernel for Any Architecture"☆93Updated 4 years ago
- ☆25Updated last year
- Source code of "What can linearized neural networks actually say about generalization?☆17Updated 2 years ago
- This repository contains PyTorch implementations of various random feature maps for dot product kernels.☆17Updated 2 months ago
- ☆52Updated last month
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆28Updated 11 months ago
- Euclidean Wasserstein-2 optimal transportation☆43Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆74Updated 7 months ago
- Code for Augment & Reduce, a scalable stochastic algorithm for large categorical distributions☆10Updated 6 years ago
- ☆56Updated 3 years ago
- ☆13Updated last year
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated last year
- Gradient Estimation with Discrete Stein Operators (NeurIPS 2022)☆17Updated 10 months ago
- [ NeurIPS '22 ] Data distillation for recommender systems. Shows equivalent performance with 2-3 orders less data.☆22Updated last year
- symbolic regression☆36Updated 2 years ago
- Transformers with doubly stochastic attention☆40Updated 2 years ago
- Graphically structured diffusion model.☆17Updated last year
- Python implementation of smooth optimal transport.☆55Updated 3 years ago
- CEVAE with VampPrior☆11Updated 6 years ago
- Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight https://openreview.net/forum?id=XJk19XzGq2J☆63Updated 5 months ago