Implementations of a Mixture-of-Experts (MoE) architecture designed for research on large language models (LLMs) and scalable neural network designs. One implementation targets a **single-device/NPU environment** while the other is built for multi-device distributed computing. Both versions showcase the core principles.
☆68Apr 8, 2025Updated last year
Alternatives and similar repositories for MoE-Mixture-of-Experts-in-PyTorch
Users that are interested in MoE-Mixture-of-Experts-in-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 5 months ago
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated 11 months ago
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- Jeroen Cottaar's work for the Kaggle Geophysical Waveform Inversion competition (2nd place)☆11Aug 11, 2025Updated 7 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Rainbow Keywords - Official PyTorch Implementation☆14Jun 27, 2024Updated last year
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researc☆11Jan 28, 2026Updated 2 months ago
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 9 months ago
- [ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization☆25Oct 5, 2025Updated 6 months ago
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- Transformer + GAT for RNA chemical reactivity prediction| Stanford Ribonanza☆11Jan 28, 2026Updated 2 months ago
- [CVPR2025] Code Release for "FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting"☆46Jun 20, 2025Updated 9 months ago
- WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights (CVPR 2024) - Official Pytorch Code☆19Mar 31, 2026Updated last week
- Official code for "Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping" (ICLR 2025)☆29Oct 25, 2025Updated 5 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A repo to contain the design notes and architectural information about the scaleable multiplayer game for OpenShift☆10Sep 29, 2022Updated 3 years ago
- Flutter Movie 📱 app built with Riverpod, GoRouter, Dio, and Freezed based on Clean Architecture. It offers a clean, scalable, and mainta…☆24Feb 16, 2026Updated last month
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated last month
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated last year
- We tackle the ill-posed inverse rendering problem with a NeRF model based on physical priors which jointly estimates scene materials, ill…☆39Dec 16, 2024Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 8 months ago
- A Scalable Chat Application, built using reactjs and express, and the architecture consists of 5 services. Used Redis and kafka for event…☆11Feb 20, 2024Updated 2 years ago
- Structured for a project based in ITCSS: Scalable and Maintainable CSS Architecture and BEMIT (BEM) Methodology.☆11Aug 29, 2018Updated 7 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Graph-based representation learning method for protein function prediction☆24Aug 25, 2025Updated 7 months ago
- ☆14Mar 15, 2025Updated last year
- [ICML 24] Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space☆16Aug 9, 2024Updated last year
- Official implementation of Neuronal Time-Invariant Representations (NeuPRINT), NeurIPS 2023☆10Mar 10, 2026Updated 3 weeks ago
- ☆10Jun 26, 2015Updated 10 years ago
- ☆23Apr 2, 2026Updated last week
- My scalable microservice architecture☆15Jul 10, 2021Updated 4 years ago
- A Prot paper related materials☆11Sep 5, 2022Updated 3 years ago
- A Software-defined Sensor Architecture for Large-scale Wideband Spectrum Monitoring☆14Feb 23, 2015Updated 11 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Pytorch Implementation of LoG 22 [Oral] -- Transductive Linear Probing: A Novel Framework for Few-Shot Node Classification☆17May 31, 2023Updated 2 years ago
- Official Implementation for "Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation"☆14May 6, 2025Updated 11 months ago
- Library of models for Protein Function prediction (part of the 18th top solution out of 1625 teams in CAFA5)☆20May 23, 2025Updated 10 months ago
- PyTorch implementation of Swap-VAE: A self-supervised approach for generating neural activity☆13Nov 17, 2021Updated 4 years ago
- Try to implement and test CVPR 2019 paper "Res2Net: A New Multi-scale Backbone Architecture" in PyTorch.☆18Jun 7, 2020Updated 5 years ago
- [Cell Patterns] Codes for paper: scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis☆21Jan 31, 2026Updated 2 months ago
- A collection of GPU experiments and benchmarks for my personal understanding and research.☆28Mar 18, 2026Updated 3 weeks ago