kyegomez / SwitchTransformersLinks
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆135Updated 2 months ago
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below
Sorting:
- The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…