Implementations of a Mixture-of-Experts (MoE) architecture designed for research on large language models (LLMs) and scalable neural network designs. One implementation targets a **single-device/NPU environment** while the other is built for multi-device distributed computing. Both versions showcase the core principles.
☆74Apr 8, 2025Updated last year
Alternatives and similar repositories for MoE-Mixture-of-Experts-in-PyTorch
Users that are interested in MoE-Mixture-of-Experts-in-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The code of 《M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis》☆14Mar 31, 2025Updated last year
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated last year
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- Rainbow Keywords - Official PyTorch Implementation☆14Jun 27, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Easy & Pretrained SOTA Deep Learning for RNA strings☆12Apr 15, 2022Updated 4 years ago
- Code for Remember and Reuse: Cross-Task Blind Image Quality Assessment via Relevance-aware Incremental Learning (ACM Multimedia 2021)☆13Sep 16, 2021Updated 4 years ago
- 2nd Place Solution for the Google Research - Identify Contrails to Reduce Global Warming Competition☆14Aug 15, 2023Updated 2 years ago
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 11 months ago
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆24Oct 5, 2025Updated 8 months ago
- Transformer + GAT for RNA chemical reactivity prediction| Stanford Ribonanza☆11Jan 28, 2026Updated 5 months ago
- Official code for "Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping" (ICLR 2025)☆29Oct 25, 2025Updated 8 months ago
- Pytorch implementation of paper "Distillation Techniques for Pseudo-rehearsal Based Incremental Learning"☆14May 5, 2026Updated last month
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆31Apr 4, 2025Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 4 months ago
- Graph-based representation learning method for protein function prediction☆24Aug 25, 2025Updated 10 months ago
- [ICML 24] Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space☆16Aug 9, 2024Updated last year
- HAAQI-Net is a novel DNN-based non-intrusive method for assessing music audio quality in hearing aid users.☆17Sep 26, 2025Updated 9 months ago
- ☆24May 26, 2026Updated last month
- A Prot paper related materials☆11Sep 5, 2022Updated 3 years ago
- Official Implementation for "Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation"☆15May 6, 2025Updated last year
- ☆17Jul 11, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ACL 2026 Main] Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis☆44Apr 24, 2026Updated 2 months ago
- Library of models for Protein Function prediction (part of the 18th top solution out of 1625 teams in CAFA5)☆20May 23, 2025Updated last year
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- PyTorch implementation of Swap-VAE: A self-supervised approach for generating neural activity☆13Nov 17, 2021Updated 4 years ago
- PyTorch implementation of "Seeing the forest and the tree: Building representations of both individual and collective dynamics with trans…☆14Jan 4, 2023Updated 3 years ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆37Updated this week
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆31Jun 15, 2026Updated 2 weeks ago
- DUNL - Neuron 2025☆27Jan 18, 2026Updated 5 months ago
- ☆13May 6, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆12Jun 22, 2024Updated 2 years ago
- ☆20May 30, 2026Updated 3 weeks ago
- A PyTorch implementation of Vector Quantized Variational Autoencoder (VQ-VAE) with EMA updates, pretrained encoder, and K-means initializ…☆22Mar 26, 2026Updated 3 months ago
- Saliency Toolbox☆21Sep 17, 2023Updated 2 years ago
- GAN paper list in text generation (2017-2020) Say it Often...☆12Jul 10, 2020Updated 5 years ago
- Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and in…☆18Nov 11, 2024Updated last year
- ☆20Jan 24, 2024Updated 2 years ago