Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆142May 11, 2026Updated last month
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆31May 11, 2026Updated last month
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆862Sep 13, 2023Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆385Jun 17, 2024Updated 2 years ago
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,247Apr 19, 2024Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆18Mar 21, 2025Updated last year
- ☆17Mar 18, 2026Updated 3 months ago
- Reasoning-based Evaluation and Ranking of Translations.☆20Jun 2, 2026Updated 2 weeks ago
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- 在Kaggle比赛 Home Credit Default Risk中测试gplearn进行特征工程的效果☆10Jul 18, 2018Updated 7 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- A collection of AWESOME things about mixture-of-experts☆1,280Dec 8, 2024Updated last year
- This is the implementation of the paper "Pre-training Time Series Models with Stock Data Customization"☆46May 30, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆725Jun 6, 2026Updated last week
- ☆55Dec 31, 2025Updated 5 months ago
- ☆69Jun 16, 2024Updated 2 years ago
- Unofficial PyTorch Implementation of OpenAI's GPT-3☆13Apr 11, 2022Updated 4 years ago
- Reinforcement Learning from Text Feedback☆45Feb 17, 2026Updated 4 months ago
- ☆30Sep 28, 2023Updated 2 years ago
- Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises☆19Nov 19, 2024Updated last year
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 9 months ago
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Source codes of our paper in TCSVT 2025: PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection☆31Feb 15, 2025Updated last year
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆15Sep 7, 2024Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆347Apr 2, 2025Updated last year
- DST is a Decoder-only simultaneous machine translation model, which can conduct policy decision and translation concurrently☆11Jun 6, 2024Updated 2 years ago
- Contains the code for my Imperial College London Master's thesis on text summarization☆11Oct 25, 2022Updated 3 years ago
- This is the offical repository for "Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion" (ICCV 2023).☆75Apr 30, 2024Updated 2 years ago
- A pipeline for the automatic construction of geometry problems along with step-by-step solutions.☆17Aug 27, 2025Updated 9 months ago
- [CVPRW 2021] DUVE network for NTIRE 2021 Quality enhancement of heavily compressed videos - Track 3 Fixed bit-rate☆10Oct 17, 2024Updated last year
- [ICLR 2026] Meta-RL Induces Exploration in Language Agents☆42Feb 1, 2026Updated 4 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A simple implementation of LoRA+: Efficient Low Rank Adaptation of Large Models☆10Mar 20, 2024Updated 2 years ago
- [IROS 2021] ADD: A Fine-grained Dynamic Inference Architecture for Semantic Image Segmentation☆10May 3, 2022Updated 4 years ago
- arXiv 每日论文,每周一到周五更新。☆49Jun 5, 2026Updated 2 weeks ago
- Code for EMNLP 2022 main conference paper "Information-Transport-based Policy for Simultaneous Translation"☆13Nov 3, 2022Updated 3 years ago
- An automated data pipeline scaling RL to pretraining levels☆77Jun 2, 2026Updated 2 weeks ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆130May 12, 2026Updated last month
- ☆18Apr 16, 2026Updated 2 months ago