minhtannguyen / transformer-mgkLinks

This is the public github for our paper "Transformer with a Mixture of Gaussian Keys"

☆28

Alternatives and similar repositories for transformer-mgk

Users that are interested in transformer-mgk are comparing it to the libraries listed below

Sorting:

pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆62Updated 3 years ago
schwartz-lab-NLP / papa
Code for the PAPA paper
☆27Updated 2 years ago
radarFudan / Curse-of-memory
Curse-of-memory phenomenon of RNNs in sequence modelling
☆19Updated 2 months ago
lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆34Updated last year
google-deepmind / ssl_hsic
☆37Updated 11 months ago
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated last month
lucidrains / tableformer-pytorch
Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch
☆39Updated 3 years ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 10 months ago
facebookresearch / ModelRatatouille
Recycling diverse models
☆45Updated 2 years ago
VITA-Group / layerGraftedPretraining_ICLR23
[ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…
☆24Updated 2 years ago
dlmacedo / distinction-maximization-loss
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in y…
☆45Updated 2 years ago
xu-ji / information-bottleneck
Deep Learning & Information Bottleneck
☆61Updated 2 years ago
SamsungSAILMontreal / ghn3
Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]
☆36Updated 10 months ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆75Updated last year
wjxts / RegularizedBN
☆21Updated 2 years ago
joshr17 / IFM
Code for paper "Can contrastive learning avoid shortcut solutions?" NeurIPS 2021.
☆47Updated 3 years ago
annosubmission / GRC-Cache
☆16Updated 2 years ago
lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 3 years ago
JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆30Updated 2 years ago
archinetai / difformer-pytorch
Diffusion based transformer, in PyTorch (Experimental).
☆24Updated 2 years ago
VITA-Group / Lifelong-Learning-LTH
[ICLR 2021] "Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning" by Tianlong Chen*, Zhenyu Zhang*, Sijia Liu, S…
☆25Updated 3 years ago
stanislavfort / adversaries_to_OOD_detection
☆12Updated 2 years ago
lucidrains / memformer
Implementation of Memformer, a Memory-augmented Transformer, in Pytorch
☆119Updated 4 years ago
PKU-ML / non_neg
Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning
☆45Updated last year
lucidrains / cross-transformers-pytorch
Implementation of Cross Transformer for spatially-aware few-shot transfer, in Pytorch
☆53Updated 4 years ago
LeslieTrue / CPP
This is the official implementation for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.
☆28Updated 2 years ago
arshadshk / Position-Prediction-Pretraining
Position Prediction as an Effective Pretraining Strategy
☆8Updated 2 years ago
alexrame / diwa
DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization
☆31Updated 2 years ago
hsouri / BayesianTransferLearning
☆109Updated 2 years ago
LukasHedegaard / structured-pruning-adapters
Structured Pruning Adapters in PyTorch
☆18Updated last year