sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
Alternatives and similar repositories for moe_layer
Users that are interested in moe_layer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆39Jun 11, 2025Updated 9 months ago
- ☆17Jun 11, 2025Updated 9 months ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆34Jun 11, 2025Updated 9 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆101Sep 30, 2024Updated last year
- Mixture of Attention Heads☆52Oct 10, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- ☆91Aug 18, 2024Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆51Jul 17, 2022Updated 3 years ago
- ☆47May 20, 2025Updated 10 months ago
- ☆16Dec 9, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Triton-based implementation of Sparse Mixture of Experts.☆270Oct 3, 2025Updated 5 months ago
- ☆22Nov 9, 2024Updated last year
- JAX Scalify: end-to-end scaled arithmetics