joaomarcoscsilva / mixture-of-experts
A replication of the paper "Adaptive Mixtures of Local Experts" applied to the CIFAR-10 image classification dataset.
☆9Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for mixture-of-experts
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆19Updated last year
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- ☆66Updated 2 months ago
- ☆32Updated 5 months ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆71Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆86Updated 4 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 6 months ago
- ☆15Updated last year
- Implementation of Multistream Transformers in Pytorch☆53Updated 3 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 2 months ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆43Updated 3 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆59Updated 2 years ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆19Updated last year
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆25Updated 3 years ago
- ☆18Updated 3 years ago
- Implementations of various linear RNN layers using pytorch and triton☆45Updated last year
- ☆50Updated last year
- ☆21Updated last year
- Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…☆27Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆94Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆64Updated last year
- Code for "Understanding and Improving Layer Normalization"☆46Updated 4 years ago
- ☆44Updated 4 months ago
- ☆31Updated 10 months ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Domain Adaptation and Adapters☆16Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆29Updated last year