borjanG / 2023-transformersLinks
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Updated 2 years ago
Alternatives and similar repositories for 2023-transformers
Users that are interested in 2023-transformers are comparing it to the libraries listed below
Sorting:
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago
- ☆62Updated last year
- ☆234Updated last year
- ☆33Updated last year
- ☆18Updated last year
- ☆73Updated last year
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- A MAD laboratory to improve AI architecture designs 🧪☆136Updated last year
- Transformers with doubly stochastic attention☆51Updated 3 years ago
- ☆52Updated 3 weeks ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆92Updated last year
- Parallelizing non-linear sequential models over the sequence length☆56Updated 6 months ago
- ☆69Updated 9 months ago
- Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024☆125Updated last year
- ☆27Updated 2 years ago
- Sampling with gradient-based Markov Chain Monte Carlo approaches☆108Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆81Updated 3 years ago
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- Code for our paper "Generative Flow Networks for Discrete Probabilistic Modeling"☆87Updated 2 years ago
- Code accompanying our paper "Feature Learning in Infinite-Width Neural Networks" (https://arxiv.org/abs/2011.14522)☆63Updated 4 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated 2 weeks ago
- The Energy Transformer block, in JAX☆63Updated 2 years ago
- Lightning-like training API for JAX with Flax☆45Updated last year
- nanoGPT-like codebase for LLM training☆114Updated 2 months ago
- Euclidean Wasserstein-2 optimal transportation☆47Updated 2 years ago
- Scalable and Stable Parallelization of Nonlinear RNNS☆28Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆181Updated 6 months ago
- Codes for the paper "A mathematical perspective on Transformers".☆39Updated last year
- Pytorch code for experiments on Linear Transformers☆25Updated last year