tonyduan / transformer-blocksLinks
Multi-Head Attention, Transformer, Perceiver, Linear Attention.
☆11Updated 2 years ago
Alternatives and similar repositories for transformer-blocks
Users that are interested in transformer-blocks are comparing it to the libraries listed below
Sorting:
- A differentiation API for PyTorch☆30Updated 5 years ago
- Code for "'Hey, that's not an ODE:' Faster ODE Adjoints via Seminorms" (ICML 2021)☆88Updated 3 years ago
- Relative gradient optimization of the Jacobian term in unsupervised deep learning, NeurIPS 2020☆21Updated 4 years ago
- ☆48Updated 2 years ago
- Continuous-time gradient flow for generative modeling and variational inference☆33Updated 7 years ago
- Hierarchical variational models for physics.☆18Updated 5 years ago
- Jax-based MaxEnt☆17Updated 5 years ago
- An example showing how to use jax to train resnet50 on multi-node multi-GPU☆20Updated 3 years ago
- Code for "Learning Unitary Operators with Help From u(n)", AAAI-17. (https://arxiv.org/abs/1607.04903)☆17Updated 8 years ago
- Dive into Jax, Flax, XLA and C++☆32Updated 5 years ago
- Code for Understanding and Mitigating Exploding Inverses in Invertible Neural Networks (AISTATS 2021) http://arxiv.org/abs/2006.09347☆30Updated 5 years ago
- Riemannian Convex Potential Maps☆67Updated 2 years ago
- ☆70Updated 2 years ago
- Convex potential flows☆84Updated 3 years ago
- [NeurIPS 2020] Neural Manifold Ordinary Differential Equations (https://arxiv.org/abs/2006.10254)☆121Updated 2 years ago
- A public repository for our paper, Rao-Blackwellized Stochastic Gradients for Discrete Distributions