andrewargatkiny / dense-attentionLinks
This is the repo for DenseAttention and DANet - fast and conceptually simple modification of standard attention and Transformer
☆18Updated this week
Alternatives and similar repositories for dense-attention
Users that are interested in dense-attention are comparing it to the libraries listed below
Sorting:
- Compression schema for gradients of activations in backward pass☆44Updated 2 years ago
- ☆18Updated 6 months ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆188Updated 2 weeks ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆86Updated 3 years ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆116Updated last month
- Various transformers for FSDP research☆38Updated 2 years ago
- Amos optimizer with JEstimator lib.