borjanG / 2023-transformers
Codes for the paper The emergence of clusters in self-attention dynamics.
☆14Updated last year
Alternatives and similar repositories for 2023-transformers:
Users that are interested in 2023-transformers are comparing it to the libraries listed below
- Euclidean Wasserstein-2 optimal transportation☆44Updated last year
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- ☆29Updated 4 months ago
- Blog post☆16Updated last year
- General Invertible Transformations for Flow-based Generative Models☆17Updated 4 years ago
- u-MPS implementation and experimentation code used in the paper Tensor Networks for Probabilistic Sequence Modeling (https://arxiv.org/ab…☆19Updated 4 years ago
- Supplementary code for the paper "Meta-Solver for Neural Ordinary Differential Equations" https://arxiv.org/abs/2103.08561☆24Updated 3 years ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- The Energy Transformer block, in JAX☆56Updated last year
- ☆9Updated last year
- Code for testing DCT plus Sparse (DCTpS) networks☆14Updated 3 years ago
- ☆15Updated 2 years ago
- Stochastic Gradient Langevin Dynamics for Bayesian learning☆30Updated 3 years ago
- ☆18Updated 2 years ago
- [NeurIPS 2020] Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters (AHGP)☆21Updated 4 years ago
- Monotone operator equilibrium networks☆51Updated 4 years ago
- Code for Augment & Reduce, a scalable stochastic algorithm for large categorical distributions☆10Updated 6 years ago
- ☆18Updated 3 years ago
- ☆24Updated 2 years ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆35Updated last year
- ☆34Updated 2 months ago
- ☆53Updated 6 months ago
- PyTorch implementation for "Probabilistic Circuits for Variational Inference in Discrete Graphical Models", NeurIPS 2020☆16Updated 3 years ago
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆41Updated last year
- ☆52Updated 4 months ago
- Transformers with doubly stochastic attention☆45Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆58Updated last year
- ☆11Updated last year
- ☆11Updated 3 years ago
- ☆31Updated 10 months ago