Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆138Apr 13, 2026Updated last week
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"☆47Jul 4, 2024Updated last year
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Updated this week
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆381Jun 17, 2024Updated last year
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated last year
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,243Apr 19, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- ☆17Mar 18, 2026Updated last month
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆83Oct 5, 2023Updated 2 years ago
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- Code from paper "Natural language supervision with a large and diverse dataset builds better models of human high-level visual cortex"☆25Feb 6, 2024Updated 2 years ago
- A collection of AWESOME things about mixture-of-experts☆1,274Dec 8, 2024Updated last year
- This is the implementation of the paper "Pre-training Time Series Models with Stock Data Customization"☆43May 30, 2025Updated 10 months ago
- ☆715Dec 6, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Method and experience of winning the NTIRE'22 VQE challenge.☆82Feb 4, 2023Updated 3 years ago
- ☆51Dec 31, 2025Updated 3 months ago
- Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"☆22Apr 13, 2024Updated 2 years ago
- ☆30Sep 28, 2023Updated 2 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 7 months ago
- Source codes of our paper in TCSVT 2025: PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection☆28Feb 15, 2025Updated last year
- Multiscale Score Matching Analysis☆11Jan 19, 2023Updated 3 years ago
- Counterfactual Inference by Machine Learning and Attribution Models☆15Aug 24, 2023Updated 2 years ago
- An official implementation of "Domain Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation", CVPR 202…☆15Jun 15, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A simple, lightweight, and efficient solution for multi-object trajectory prediction☆14Apr 27, 2025Updated 11 months ago
- The implementations for CVPR 2025 paper "Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images".☆32Mar 16, 2026Updated last month
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆345Apr 2, 2025Updated last year
- The implementations for NeurIPS 2024 paper "Leveraging Tumor Heterogeneity: Heterogeneous Graph Representation Learning for Cancer Surviv…☆12Jun 11, 2025Updated 10 months ago
- Data manipulation and transformation for audio signal processing, powered by PyTorch☆11Sep 30, 2024Updated last year
- Community detection on Hollywood actors using various models: Louvain, Clauset-Newman-Moore, GCN, GraphSage, and GAT.☆10Dec 11, 2019Updated 6 years ago
- Contains the code for my Imperial College London Master's thesis on text summarization☆10Oct 25, 2022Updated 3 years ago
- A pipeline for the automatic construction of geometry problems along with step-by-step solutions.☆17Aug 27, 2025Updated 7 months ago
- [CVPRW 2021] DUVE network for NTIRE 2021 Quality enhancement of heavily compressed videos - Track 3 Fixed bit-rate☆10Oct 17, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [IROS 2021] ADD: A Fine-grained Dynamic Inference Architecture for Semantic Image Segmentation☆10May 3, 2022Updated 3 years ago
- ☆133Nov 8, 2025Updated 5 months ago
- An automated data pipeline scaling RL to pretraining levels☆75Oct 11, 2025Updated 6 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆35Dec 12, 2023Updated 2 years ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆126Mar 22, 2026Updated 3 weeks ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆117Updated this week
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 8 months ago