[ICLR 2025] Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
☆25Oct 5, 2025Updated 5 months ago
Alternatives and similar repositories for Drop-Upcycling
Users that are interested in Drop-Upcycling are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of CD-MOE☆12Mar 13, 2026Updated last week
- ☆24Jan 27, 2025Updated last year
- Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model☆13Feb 11, 2025Updated last year
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated 10 months ago
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 11 months ago
- [ICML 2025 Oral] Mixture of Lookup Experts☆72Dec 3, 2025Updated 3 months ago
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 5 months ago
- Official implementation of "Mixture of Experts Meets Prompt-Based Continual Learning" (NeurIPS 2024)☆44Aug 1, 2025Updated 7 months ago
- Mixture-of-Experts Multimodal Variational Autoencoder☆15Jul 3, 2025Updated 8 months ago
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆61Feb 7, 2025Updated last year
- ☆16Jun 10, 2024Updated last year
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated 11 months ago
- [CVPR 2025] Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts☆23Jun 22, 2025Updated 8 months ago
- Codes for Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation (WWW2025)☆27Jun 17, 2025Updated 9 months ago
- Randomized algorithm class at CU☆15Jul 8, 2025Updated 8 months ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated last year
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆35Jan 18, 2025Updated last year
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆93Dec 3, 2024Updated last year
- Implementation of the ICLR 2025 paper "Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models"☆26Apr 2, 2025Updated 11 months ago
- ☆18Aug 19, 2024Updated last year
- ☆15Jul 13, 2025Updated 8 months ago
- ☆12Jul 6, 2022Updated 3 years ago
- (ICLR 2026) Unveiling Super Experts in Mixture-of-Experts Large Language Models☆39Sep 25, 2025Updated 5 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆45May 10, 2023Updated 2 years ago
- The website of the Oscar Project☆11Mar 27, 2025Updated 11 months ago
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers☆48Oct 21, 2022Updated 3 years ago
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- [WIP] AI that "reads" live TV and writes it as a movie script in real-time.☆23Jun 3, 2025Updated 9 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆109Dec 20, 2024Updated last year
- Kaggleのshopeeコンペのリポジトリ☆11Jun 7, 2021Updated 4 years ago
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆31Jun 7, 2024Updated last year
- Implementations of a Mixture-of-Experts (MoE) architecture designed for research on large language models (LLMs) and scalable neural netw…☆62Apr 8, 2025Updated 11 months ago
- The official implementation of "FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of Experts"☆45Mar 17, 2025Updated last year
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated last year
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Jul 31, 2023Updated 2 years ago