joaomarcoscsilva / mixture-of-experts

A replication of the paper "Adaptive Mixtures of Local Experts" applied to the CIFAR-10 image classification dataset.

☆9

Related projects ⓘ

Alternatives and complementary repositories for mixture-of-experts

lsj2408 / URPE
[NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)
☆33Updated last year
ag1988 / dlr
The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…
☆19Updated last year
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆120Updated last year
deep-spin / infinite-former
☆66Updated 2 months ago
Itamarzimm / UnifiedImplicitAttnRepr
☆32Updated 5 months ago
lucidrains / memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆71Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆86Updated 4 months ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆35Updated 11 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆61Updated 6 months ago
thu-ml / LM-Calibration
☆15Updated last year
lucidrains / multistream-transformers
Implementation of Multistream Transformers in Pytorch
☆53Updated 3 years ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆49Updated 2 months ago
davidsvy / cosformer-pytorch
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".
☆43Updated 3 years ago
pkuzengqi / Skyformer
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)
☆59Updated 2 years ago
twinkle0331 / Xcompression
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆19Updated last year
jaketae / fnet
PyTorch implementation of FNet: Mixing Tokens with Fourier transforms
☆25Updated 3 years ago
mlpen / YOSO
☆18Updated 3 years ago
sustcsonglin / flash-linear-rnn
Implementations of various linear RNN layers using pytorch and triton
☆45Updated last year
microsoft / EfficientLongSequenceModeling
☆50Updated last year
wjxts / RegularizedBN
☆21Updated last year
MGheini / xattn-transfer-for-mt
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…
☆27Updated 3 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆94Updated last year
lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 2 years ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆64Updated last year
lancopku / AdaNorm
Code for "Understanding and Improving Layer Normalization"
☆46Updated 4 years ago
HazyResearch / prefix-linear-attention
☆44Updated 4 months ago
sustcsonglin / gated_linear_attention_layer
☆31Updated 10 months ago
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆57Updated last year
abhinavkashyap / domadapter
Domain Adaptation and Adapters
☆16Updated last year
BaohaoLiao / mefts
[NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
☆29Updated last year