Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
☆23Nov 11, 2025Updated 3 months ago
Alternatives and similar repositories for EEP
Users that are interested in EEP are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 29, 2025Updated 11 months ago
- The official code of paper "OMS-DPM: Optimizing Model Schedule for Diffusion Probabilistic Model" accepted by ICML 2023☆24Oct 11, 2023Updated 2 years ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Nov 24, 2024Updated last year
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆78Feb 10, 2026Updated 3 weeks ago
- [WSDM'24 Oral] The official implementation of paper <DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting>☆23Mar 11, 2024Updated last year
- The collection of papers about Private Evolution☆18Oct 7, 2025Updated 4 months ago
- Code Repository of Evaluating Quantized Large Language Models☆136Sep 8, 2024Updated last year
- [DATE'23] The official code for paper <CLAP: Locality Aware and Parallel Triangle Counting with Content Addressable Memory>☆23Jan 19, 2026Updated last month
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 6 months ago
- [ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation☆183Mar 1, 2024Updated 2 years ago
- [NeurIPS 2025] Latent Zoning Networks☆58Feb 21, 2026Updated last week
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆49Nov 27, 2024Updated last year
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆155Jan 14, 2026Updated last month
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More☆66Feb 12, 2025Updated last year
- Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"☆12Oct 14, 2025Updated 4 months ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆37Aug 20, 2024Updated last year
- ☆14Aug 9, 2024Updated last year
- Codes for the paper "CausalCite: A Causal Formulation of Paper Citations" (2023)☆16Jan 11, 2024Updated 2 years ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆72Mar 25, 2025Updated 11 months ago
- [ICLR 2023] 'Revisiting Pruning At Initialization Through The Lens of Ramanujan Graph" by Duc Hoang, Shiwei Liu, Radu Marculescu, Atlas W…☆14Aug 4, 2023Updated 2 years ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- AIMO2 2nd place solution☆73May 28, 2025Updated 9 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆114May 24, 2024Updated last year
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 4 months ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year
- [ICLR2023] NTK-SAP: Improving neural network pruning by aligning training dynamics☆20May 1, 2023Updated 2 years ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"☆81Jul 7, 2025Updated 7 months ago
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 11 months ago
- ASIC simulation of Multi-ported Memory Module. And it can offer SRAM-based dual-port basic building block to support multiple read/write …☆22May 30, 2016Updated 9 years ago
- ☆23Nov 26, 2024Updated last year
- Official implementation for LaCo (EMNLP 2024 Findings)☆21Oct 3, 2024Updated last year
- Official implementation for the paper "Understanding Hyperdimensional Computing for Parallel Single-Pass Learning"☆23Jun 10, 2023Updated 2 years ago
- Port of C++ munkres to Tensorflow interface☆16Oct 24, 2017Updated 8 years ago
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated last year
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆98Nov 25, 2024Updated last year
- [ICML 2021 Oral] "CATE: Computation-aware Neural Architecture Encoding with Transformers" by Shen Yan, Kaiqiang Song, Fei Liu, Mi Zhang☆19Jun 23, 2021Updated 4 years ago