borjanG / 2023-transformersLinks
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Updated 2 years ago
Alternatives and similar repositories for 2023-transformers
Users that are interested in 2023-transformers are comparing it to the libraries listed below
Sorting:
- ☆62Updated last year
- Omnigrok: Grokking Beyond Algorithmic Data☆62Updated 2 years ago
- ☆73Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆92Updated 2 years ago
- ☆52Updated last month
- Parallelizing non-linear sequential models over the sequence length☆56Updated 7 months ago
- ☆33Updated last year
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆191Updated 3 weeks ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆127Updated 2 years ago
- DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule☆64Updated 2 years ago
- The Energy Transformer block, in JAX☆63Updated 2 years ago
- A State-Space Model with Rational Transfer Function Representation.☆83Updated last year
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆42Updated 2 years ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated last year
- unofficial re-implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"☆83Updated 3 years ago
- ☆69Updated 10 months ago
- Official PyTorch implementation of NeuralSVD (ICML 2024)☆22Updated last year
- Pytorch code for experiments on Linear Transformers☆25Updated 2 years ago
- ☆238Updated last year
- nanoGPT-like codebase for LLM training☆113Updated 2 months ago
- Transformers with doubly stochastic attention☆53Updated 3 years ago
- This repository contains the official code for Energy Transformer---an efficient Energy-based Transformer variant for graph classificatio…☆25Updated 2 years ago
- Implementations of various linear RNN layers using pytorch and triton☆54Updated 2 years ago
- Euclidean Wasserstein-2 optimal transportation☆47Updated 2 years ago
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆40Updated 2 years ago
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆17Updated 3 years ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆20Updated last year
- Parameter-Free Optimizers for Pytorch☆130Updated last year
- ☆53Updated last year
- ☆76Updated 3 years ago