borjanG / 2023-transformersLinks
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Updated last year
Alternatives and similar repositories for 2023-transformers
Users that are interested in 2023-transformers are comparing it to the libraries listed below
Sorting:
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago
- ☆62Updated last year
- ☆227Updated last year
- ☆33Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆80Updated 3 years ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- ☆73Updated last year
- ☆49Updated last month
- Parallelizing non-linear sequential models over the sequence length☆56Updated 5 months ago
- Codes for the paper "A mathematical perspective on Transformers".☆39Updated last year
- ☆53Updated last year
- Transformers with doubly stochastic attention☆50Updated 3 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated this week
- A State-Space Model with Rational Transfer Function Representation.☆83Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆92Updated last year
- Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform☆43Updated 6 years ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆63Updated 9 months ago
- Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024☆124Updated last year
- Implementations of various linear RNN layers using pytorch and triton☆54Updated 2 years ago
- ☆34Updated last year
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆20Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆27Updated last month
- ☆27Updated 2 years ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- ☆67Updated 8 months ago
- ☆31Updated last year
- The Energy Transformer block, in JAX☆62Updated 2 years ago
- ☆23Updated 10 months ago
- Code for experiments on transformers using Markovian data.☆21Updated last year