wiedersehne / Paramixer
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆20Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Paramixer
- Implementation of LogAvgExp for Pytorch☆32Updated 2 years ago
- Official code for NeurIPS paper "Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach".☆16Updated 2 years ago
- ☆21Updated last year
- ☆40Updated last year
- ☆15Updated last year
- ImageNet-12k subset of ImageNet-21k (fall11)☆20Updated last year
- Implementation of Kronecker Attention in Pytorch☆17Updated 4 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- ☆19Updated last year
- Code for BlockSwap (ICLR 2020).☆33Updated 3 years ago
- Code base for SRSGD.☆28Updated 4 years ago
- PyTorch implementation of HashedNets☆36Updated last year
- Architecture embeddings independent from the parametrization of the search space☆15Updated 3 years ago
- Efficient Neural Network Loss Landscape Generation☆10Updated 5 years ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆15Updated 3 years ago
- Official Implementation of Convolutional Normalization: Improving Robustness and Training for Deep Neural Networks☆30Updated 2 years ago
- [CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jon…☆68Updated last year
- Unofficial pytorch implementation of ReZero in ResNet☆23Updated 4 years ago
- Cyclic Differentiable Architecture Search☆35Updated 2 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- ☆41Updated 3 years ago
- ☆36Updated 2 years ago
- Efficient Householder Transformation in PyTorch☆62Updated 3 years ago
- Code for the ICML 2021 and ICLR 2022 papers: Skew Orthogonal Convolutions, Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100☆18Updated 2 years ago
- Self-Distillation with weighted ground-truth targets; ResNet and Kernel Ridge Regression☆17Updated 3 years ago
- Codes for Understanding Architectures Learnt by Cell-based Neural Architecture Search☆27Updated 4 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆15Updated 3 years ago
- [ICLR 2022] "Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining" by Lu Miao*, Xiaolong Luo*, T…☆29Updated 2 years ago
- ☆25Updated 4 years ago
- Code for CVPR2021 paper: MOOD: Multi-level Out-of-distribution Detection☆38Updated last year