wiedersehne / Paramixer
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆20Updated 2 years ago
Alternatives and similar repositories for Paramixer:
Users that are interested in Paramixer are comparing it to the libraries listed below
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆17Updated 4 years ago
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- ☆41Updated 2 years ago
- ☆41Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- PyTorch and Torch implementation for our accepted CVPR 2020 paper (Oral): Controllable Orthogonalization in Training DNNs☆24Updated 4 years ago
- Code base for SRSGD.☆28Updated 5 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- ☆25Updated 3 years ago
- ☆15Updated 4 years ago
- PyTorch implementation of HashedNets☆36Updated 2 years ago
- Codebase for the paper "A Gradient Flow Framework for Analyzing Network Pruning"☆21Updated 4 years ago
- Robust Optimal Transport code☆43Updated 2 years ago
- Piecewise Linear Functions (PWL) implementation in PyTorch☆51Updated 3 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated last year
- diffGrad: An Optimization Method for Convolutional Neural Networks☆55Updated 2 years ago
- ☆20Updated 2 years ago
- Official Implementation of Convolutional Normalization: Improving Robustness and Training for Deep Neural Networks☆30Updated 3 years ago
- Bootstrap Your Own Latent (BYOL) pytorch implementation using DistributedDataParallel.☆28Updated 2 years ago
- PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision☆23Updated 3 years ago
- [ICCV 2021] A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling"☆80Updated 2 years ago
- Official code for NeurIPS paper "Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach".☆16Updated 2 years ago
- Repo reproducing experimental results in "Addressing the Topological Defects of Disentanglement"☆22Updated 2 years ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- [ICLR 2021] "GANs Can Play Lottery Too" by Xuxi Chen, Zhenyu Zhang, Yongduo Sui, Tianlong Chen☆26Updated 3 years ago
- ☆23Updated 4 years ago
- PyTorch implementation of IRMAE https//arxiv.org/abs/2010.00679☆46Updated 2 years ago
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆56Updated 2 years ago
- Convolutions and more as einsum for PyTorch☆16Updated 10 months ago