borjanG / 2023-transformersLinks
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Updated last year
Alternatives and similar repositories for 2023-transformers
Users that are interested in 2023-transformers are comparing it to the libraries listed below
Sorting:
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆88Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆79Updated 3 years ago
- ☆71Updated 10 months ago
- ☆33Updated last year
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated 2 weeks ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆63Updated 2 years ago
- Codes for the paper "A mathematical perspective on Transformers".☆39Updated last year
- Parallelizing non-linear sequential models over the sequence length☆54Updated 4 months ago
- ☆60Updated last year
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 3 years ago
- Universal Neurons in GPT2 Language Models☆30Updated last year
- Non official implementation of the Linear Recurrent Unit (LRU, Orvieto et al. 2023)☆58Updated 2 months ago
- ☆45Updated 2 weeks ago
- ☆27Updated 2 years ago
- ☆220Updated 11 months ago
- Pytorch code for experiments on Linear Transformers☆23Updated last year
- ☆66Updated 7 months ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆19Updated 11 months ago
- Transformers with doubly stochastic attention☆49Updated 3 years ago
- Official PyTorch implementation of NeuralSVD (ICML 2024)☆20Updated last year
- Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024☆122Updated 11 months ago
- Code associated to papers on superposition (in ML interpretability)☆33Updated 3 years ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆39Updated 2 years ago
- ☆34Updated last year
- ☆53Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆132Updated 10 months ago
- PyTorch implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆38Updated 3 years ago
- ☆39Updated last year
- Blog post☆17Updated last year