Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
☆136Jan 17, 2026Updated last month
Alternatives and similar repositories for SwitchTransformers
Users that are interested in SwitchTransformers are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"☆47Jul 4, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- ☆17Jul 30, 2025Updated 7 months ago
- PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538☆1,232Apr 19, 2024Updated last year
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Jan 31, 2026Updated last month
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆82Oct 5, 2023Updated 2 years ago
- ☆22Dec 15, 2023Updated 2 years ago
- Open Source Mycetoma's First Series of Molecules☆10Sep 22, 2025Updated 5 months ago
- ☆46Dec 31, 2025Updated 2 months ago
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- Public repository for the ECCV 2024 paper "Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation".☆26Aug 5, 2025Updated 7 months ago
- Action sequence prediction for arbitrary chemical equations☆26Mar 29, 2021Updated 4 years ago
- Implementation of Deep evidential regression paper☆58Dec 6, 2020Updated 5 years ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 8 months ago
- Official Code Release for "Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection" in NeurIPS 2…☆29Apr 20, 2025Updated 10 months ago
- A collection of AWESOME things about mixture-of-experts☆1,269Dec 8, 2024Updated last year
- Pytorch implementation of our UniQ method, IEEE Access -- Training Multi-bit Quantized and Binarized Networks with A Learnable Symmetric …☆11Apr 7, 2021Updated 4 years ago
- [TGRS'25] Multilevel Embedding and Alignment Network With Consistency and Invariance Learning for Cross-View Geo-Localization.☆48Feb 27, 2026Updated last week
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆38Sep 26, 2024Updated last year
- ☆707Dec 6, 2025Updated 3 months ago
- Claude Code Template with intelligent task management, specialized agents, and automated workflows for full-stack development☆18Oct 20, 2025Updated 4 months ago
- Project exploring 3D volumetric rendering of NEXRAD radar data.☆11Oct 23, 2023Updated 2 years ago
- Repo for "Centaur: Robust Multimodal Fusion for Human Activity Recognition"☆10Jan 9, 2024Updated 2 years ago
- Search-Category-And-Info-Detail API☆12Mar 7, 2023Updated 3 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- Metal Activity Heuristic of Metalloprotein and Enzymatic Sites (MAHOMES) - Predicts if a protein bound metal ion is enzymatic or non-enzy…☆11Apr 19, 2022Updated 3 years ago
- [AAAI 2024-Oral] EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder☆35Apr 10, 2024Updated last year
- Stochastic Gradient Langevin Dynamics for Bayesian learning☆36Nov 29, 2021Updated 4 years ago
- ChemicalTagger is a tool for semantic text-mining in chemistry.☆45Jan 13, 2026Updated last month
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Apr 4, 2024Updated last year
- Official implementation of “The Source Image is the Best Attention for Infrared and Visible Image Fusion”☆23Oct 16, 2025Updated 4 months ago
- ☆10Jul 8, 2021Updated 4 years ago
- ☆14Sep 23, 2024Updated last year
- MCP server providing tools to create Ms Office documents like presentations, emails, spreadshhets and word docs (pptx, docx, eml, xlsx)☆14Feb 20, 2026Updated 2 weeks ago
- 湖南大学课程论文LaTeX模板☆17Jul 14, 2024Updated last year
- ☆12Jul 4, 2024Updated last year
- ☆18Aug 16, 2025Updated 6 months ago
- [arXiv] Without Paired Labeled Data: End-to-End Self-Supervised Method for Drone-View Geo-Localization☆80Feb 27, 2026Updated last week