Implementations of a Mixture-of-Experts (MoE) architecture designed for research on large language models (LLMs) and scalable neural network designs. One implementation targets a **single-device/NPU environment** while the other is built for multi-device distributed computing. Both versions showcase the core principles.
☆68Apr 8, 2025Updated last year
Alternatives and similar repositories for MoE-Mixture-of-Experts-in-PyTorch
Users that are interested in MoE-Mixture-of-Experts-in-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 6 months ago
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated last year
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated last year
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- Easy & Pretrained SOTA Deep Learning for RNA strings☆12Apr 15, 2022Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researc☆12Jan 28, 2026Updated 3 months ago
- Transformer + GAT for RNA chemical reactivity prediction| Stanford Ribonanza☆11Jan 28, 2026Updated 3 months ago
- ☆12Apr 18, 2025Updated last year
- ☆17Oct 18, 2023Updated 2 years ago
- Official code for "Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping" (ICLR 2025)☆28Oct 25, 2025Updated 6 months ago
- Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"☆17May 19, 2023Updated 2 years ago
- Extract face from image using face++☆17May 25, 2015Updated 10 years ago
- Luck2x is a free online casino platform offering a wide variety of engaging games, including Slots, Crash, Mines, Tower, Dice, PvP, Roule…☆22Jan 11, 2026Updated 3 months ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Training HuggingFace models using fastai☆11Jul 22, 2021Updated 4 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- Graph-based representation learning method for protein function prediction☆24Aug 25, 2025Updated 8 months ago
- [CVPR 2025] Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts☆23Jun 22, 2025Updated 10 months ago
- ☆14Mar 15, 2025Updated last year
- HAAQI-Net is a novel DNN-based non-intrusive method for assessing music audio quality in hearing aid users.☆17Sep 26, 2025Updated 7 months ago
- Codes for Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation (WWW2025)☆30Jun 17, 2025Updated 10 months ago
- Official implementation of Neuronal Time-Invariant Representations (NeuPRINT), NeurIPS 2023☆10Mar 10, 2026Updated last month
- Spatial Audio Metrics (SAM) is a toolbox to analyse spatial audio and spatial audio perceptual experiments☆34Jan 8, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Keras cropping layer implementation☆13Aug 23, 2016Updated 9 years ago
- ☆10Jun 26, 2015Updated 10 years ago
- ☆23Apr 7, 2026Updated 3 weeks ago
- A modular, scalable, and maintainable Spring Boot microservices application demonstrating a HexaLayered Architecture for phone directory …☆13Feb 3, 2025Updated last year
- Researching next-gen blockchain architecture (as of 2026) to achieve ultimate scalability in permissionless setting and fully resolve Blo…☆24Apr 23, 2026Updated last week
- A package to study complex networks based on the temporal evolution of their Dynamic Communicability and Flow.☆11Jan 30, 2026Updated 2 months ago
- [ACL 2026 Main] Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis☆38Updated this week
- 이화여대 강의자료☆29Feb 8, 2024Updated 2 years ago
- Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs (AAAI 2024)☆15Jul 30, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆17Jul 11, 2023Updated 2 years ago
- ☆20Apr 30, 2025Updated 11 months ago
- A simple, scalable, and powerful architecture for building production ready React applications.☆24Mar 16, 2026Updated last month
- The official PyTorch code for AAAI'23 Paper "Sparse Coding in a Dual Memory System for Lifelong Learning"☆12Feb 15, 2023Updated 3 years ago
- PyTorch implementation of Swap-VAE: A self-supervised approach for generating neural activity☆13Nov 17, 2021Updated 4 years ago
- ☆15Jul 2, 2020Updated 5 years ago
- PyTorch implementation of "Seeing the forest and the tree: Building representations of both individual and collective dynamics with trans…☆14Jan 4, 2023Updated 3 years ago