Implementations of a Mixture-of-Experts (MoE) architecture designed for research on large language models (LLMs) and scalable neural network designs. One implementation targets a **single-device/NPU environment** while the other is built for multi-device distributed computing. Both versions showcase the core principles.
☆71Apr 8, 2025Updated last year
Alternatives and similar repositories for MoE-Mixture-of-Experts-in-PyTorch
Users that are interested in MoE-Mixture-of-Experts-in-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆24Oct 13, 2025Updated 7 months ago
- [ICCV 2025] MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network☆34Dec 16, 2025Updated 5 months ago
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- The code of 《M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis》☆14Mar 31, 2025Updated last year
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- Easy & Pretrained SOTA Deep Learning for RNA strings☆12Apr 15, 2022Updated 4 years ago
- 2nd Place Solution for the Google Research - Identify Contrails to Reduce Global Warming Competition☆14Aug 15, 2023Updated 2 years ago
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 11 months ago
- Optimal-er Auctions through Attention, NeurIPS 2022☆23Dec 14, 2022Updated 3 years ago
- Transformer + GAT for RNA chemical reactivity prediction| Stanford Ribonanza☆11Jan 28, 2026Updated 4 months ago
- [CVPR2025] Code Release for "FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting"☆47Jun 20, 2025Updated 11 months ago
- ☆17Oct 18, 2023Updated 2 years ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated 3 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Training HuggingFace models using fastai☆11Jul 22, 2021Updated 4 years ago
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆31Apr 4, 2025Updated last year
- We tackle the ill-posed inverse rendering problem with a NeRF model based on physical priors which jointly estimates scene materials, ill…☆39Dec 16, 2024Updated last year
- Graph-based representation learning method for protein function prediction☆24Aug 25, 2025Updated 9 months ago
- [CVPR 2025] Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts☆23Jun 22, 2025Updated 11 months ago
- PRO Deployer - Simple and powerful SFTP/FTP deployer. Support concurrency uploading or delete files (very fast uploading and deleting fil…☆23Feb 13, 2026Updated 3 months ago
- [ICML 24] Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space☆16Aug 9, 2024Updated last year
- ☆44Mar 13, 2025Updated last year
- Codes for Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation (WWW2025)☆31Jun 17, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implemention of Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. (Nature Genetics 20…☆14Sep 5, 2019Updated 6 years ago
- ☆24May 26, 2026Updated 2 weeks ago
- [WWW 2025] Code for Modality Interactive Mixture-of-Experts for Fake News Detection☆39Jun 25, 2025Updated 11 months ago
- Implementation of the ICLR 2025 paper "Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models"☆29Apr 2, 2025Updated last year
- A Prot paper related materials☆11Sep 5, 2022Updated 3 years ago
- Pytorch Implementation of LoG 22 [Oral] -- Transductive Linear Probing: A Novel Framework for Few-Shot Node Classification☆17May 31, 2023Updated 3 years ago
- Official Implementation for "Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation"☆15May 6, 2025Updated last year
- ☆17Jul 11, 2023Updated 2 years ago
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- PyTorch implementation of Swap-VAE: A self-supervised approach for generating neural activity☆13Nov 17, 2021Updated 4 years ago
- ☆15Jul 2, 2020Updated 5 years ago
- ☆10Sep 11, 2020Updated 5 years ago
- [Cell Patterns] Codes for paper: scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis☆24Jan 31, 2026Updated 4 months ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆37May 12, 2026Updated 3 weeks ago
- DUNL - Neuron 2025☆26Jan 18, 2026Updated 4 months ago
- ☆18Dec 29, 2023Updated 2 years ago