wiedersehne / ParamixerLinks
Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention (CVPR 2022)
☆20Updated 2 years ago
Alternatives and similar repositories for Paramixer
Users that are interested in Paramixer are comparing it to the libraries listed below
Sorting:
- ☆41Updated 4 years ago
- Official code for NeurIPS paper "Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach".☆16Updated 2 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆36Updated 3 years ago
- Simple notebooks to learn diffusion models on toy datasets☆17Updated 2 years ago
- ☆41Updated 2 years ago
- Implementation of Kronecker Attention in Pytorch☆19Updated 4 years ago
- Repo reproducing experimental results in "Addressing the Topological Defects of Disentanglement"☆22Updated 2 years ago
- PyTorch implementation of IRMAE https//arxiv.org/abs/2010.00679☆47Updated 2 years ago
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆17Updated 4 years ago
- Architecture embeddings independent from the parametrization of the search space☆15Updated 4 years ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- ☆20Updated 2 years ago
- Robust Optimal Transport code☆43Updated 2 years ago
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated last year
- Simple but high-performing method for learning a policy of test-time augmentation☆38Updated 2 years ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆15Updated 3 years ago
- Reproducible code for Augmentation paper☆17Updated 6 years ago
- Piecewise Linear Functions (PWL) implementation in PyTorch☆52Updated 3 years ago
- A GPT, made only of MLPs, in Jax☆58Updated 3 years ago
- Cyclic Differentiable Architecture Search☆36Updated 3 years ago
- diffGrad: An Optimization Method for Convolutional Neural Networks☆55Updated 2 years ago
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆46Updated 3 years ago
- Exploiting Uncertainty of Loss Landscape for Stochastic Optimization☆15Updated 6 years ago
- Official implementation for "Minimax Active Learning" in PyTorch.☆9Updated 4 years ago
- Implementation of Spectral Leakage and Rethinking the Kernel Size in CNNs in Pytorch☆14Updated 4 years ago
- Code base for SRSGD.☆28Updated 5 years ago
- ☆15Updated 4 years ago
- SimCLR pytorch implementation using DistributedDataParallel.☆24Updated 2 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆51Updated 3 years ago
- ☆24Updated 2 years ago