Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆138Mar 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Mar 22, 2026Updated last week
- A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models☆848Sep 13, 2023Updated 2 years ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆379Jun 17, 2024Updated last year
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,240Apr 19, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆17Mar 18, 2026Updated last week
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 8 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆83Oct 5, 2023Updated 2 years ago
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- Deft: A Scalable Tree Index for Disaggregated Memory☆23Apr 23, 2025Updated 11 months ago
- Code from paper "Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex"☆25Feb 6, 2024Updated 2 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- ☆51Dec 31, 2025Updated 3 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A collection of AWESOME things about mixture-of-experts☆1,272Dec 8, 2024Updated last year
- This is the implementation of the paper "Pre-training Time Series Models with Stock Data Customization"☆41May 30, 2025Updated 10 months ago
- ☆713Dec 6, 2025Updated 3 months ago
- This is an read-only mirror of the gem5 simulator. The upstream repository is stored in https://gem5.googlesource.com, code reviews shoul…☆13May 15, 2020Updated 5 years ago
- The implementations for CVPR 2025 paper "Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images".☆28Mar 16, 2026Updated 2 weeks ago
- ☆14Nov 26, 2025Updated 4 months ago
- Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel Noises☆18Nov 19, 2024Updated last year
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 6 months ago
- Automate hyper-parameters tuning for NNs (learning rate, number of dense layers and nodes and activation function)☆14Aug 9, 2020Updated 5 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Counterfactual Inference by Machine Learning and Attribution Models☆15Aug 24, 2023Updated 2 years ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆344Apr 2, 2025Updated 11 months ago
- ☆10Sep 17, 2020Updated 5 years ago
- [NeurIPS 2024 Spotlight] code for "Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement"☆19Jan 26, 2025Updated last year
- The implementations for NeurIPS 2024 paper "Leveraging Tumor Heterogeneity: Heterogeneous Graph Representation Learning for Cancer Surviv…☆12Jun 11, 2025Updated 9 months ago
- Official Pytorch implementation of (Roles and Utilization of Attention Heads in Transformer-based Neural Language Models), ACL 2020☆16Mar 21, 2025Updated last year
- Community detection on Hollywood actors using various models: Louvain, Clauset-Newman-Moore, GCN, GraphSage, and GAT.☆10Dec 11, 2019Updated 6 years ago
- This is the offical repository for "Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion" (ICCV 2023).☆72Apr 30, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A pipeline for the automatic construction of geometry problems along with step-by-step solutions.☆17Aug 27, 2025Updated 7 months ago
- A simple implementation of LoRA+: Efficient Low Rank Adaptation of Large Models☆10Mar 20, 2024Updated 2 years ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated last year
- [IROS 2021] ADD: A Fine-grained Dynamic Inference Architecture for Semantic Image Segmentation☆10May 3, 2022Updated 3 years ago
- arXiv 每日论文,每周一到周五更新。☆30Updated this week
- Code for EMNLP 2022 main conference paper "Information-Transport-based Policy for Simultaneous Translation"☆13Nov 3, 2022Updated 3 years ago
- An automated data pipeline scaling RL to pretraining levels☆74Oct 11, 2025Updated 5 months ago