Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆139Apr 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆30Apr 13, 2026Updated 3 weeks ago
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆859Sep 13, 2023Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆383Jun 17, 2024Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,242Apr 19, 2024Updated 2 years ago
- ☆17Mar 18, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 9 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆83Oct 5, 2023Updated 2 years ago
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- Deft: A Scalable Tree Index for Disaggregated Memory☆23Apr 23, 2025Updated last year
- Pytorch implementation of GPT-1☆35May 28, 2022Updated 3 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,275Dec 8, 2024Updated last year
- ☆717Dec 6, 2025Updated 5 months ago
- ☆54Dec 31, 2025Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Method and experience of winning the NTIRE'22 VQE challenge.☆83Feb 4, 2023Updated 3 years ago
- Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"☆22Apr 13, 2024Updated 2 years ago
- Reinforcement Learning from Text Feedback☆36Feb 17, 2026Updated 2 months ago
- This is an read-only mirror of the gem5 simulator. The upstream repository is stored in https://gem5.googlesource.com, code reviews shoul…☆13May 15, 2020Updated 5 years ago
- Official code for Cross-Domain Policy Adaptation by Capturing Representation Mismatch (ICML 2024)☆15Aug 15, 2025Updated 8 months ago
- ☆30Sep 28, 2023Updated 2 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 7 months ago
- Summary of Hyperspectral Target Tracking☆17Dec 26, 2025Updated 4 months ago
- A streaming whisper server for on-prem transcription☆23Aug 15, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2026] Meta-RL Induces Exploration in Language Agents☆36Feb 1, 2026Updated 3 months ago
- An comprehensive list of hyperspectral image classification resources (papers & codes & related websites) collected by Jiaqi Zou (immorta…☆20Jul 14, 2023Updated 2 years ago
- A simple, lightweight, and efficient solution for multi-object trajectory prediction☆14Apr 28, 2026Updated last week
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆15Sep 7, 2024Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆345Apr 2, 2025Updated last year
- [CVPRW 2021] DUVE network for NTIRE 2021 Quality enhancement of heavily compressed videos - Track 3 Fixed bit-rate☆10Oct 17, 2024Updated last year
- ☆14Nov 13, 2022Updated 3 years ago
- Python Version of Andrew Welter's Hatebase Wrapper☆10Feb 20, 2022Updated 4 years ago
- Code for EMNLP 2022 main conference paper "Information-Transport-based Policy for Simultaneous Translation"☆13Nov 3, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Pytorch implementation for DeepSpeech 2.0☆32Jul 25, 2024Updated last year
- An automated data pipeline scaling RL to pretraining levels☆76Oct 11, 2025Updated 6 months ago
- 不用框架使用numpy从零搭建深度神经网络(DNN)☆12Dec 3, 2018Updated 7 years ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆35Dec 12, 2023Updated 2 years ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆127Apr 13, 2026Updated 3 weeks ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆120Apr 13, 2026Updated 3 weeks ago
- Pytorch implementation of various token mixers; Attention Mechanisms, MLP, and etc for understanding computer vision papers and other tas…☆17Mar 11, 2026Updated last month