[NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design"
☆15Feb 4, 2025Updated last year
Alternatives and similar repositories for C2R-MoE
Users that are interested in C2R-MoE are comparing it to the libraries listed below
Sorting:
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated 10 months ago
- Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"☆15Mar 6, 2025Updated 11 months ago
- ☆19Nov 5, 2024Updated last year
- Kinetics: Rethinking Test-Time Scaling Laws☆85Jul 11, 2025Updated 7 months ago
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- [ICML 2024] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"☆10Jul 1, 2024Updated last year
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆19Jan 25, 2025Updated last year
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- ☆22Jun 10, 2025Updated 8 months ago
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 6 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆155Jan 14, 2026Updated last month
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆102Jun 20, 2025Updated 8 months ago
- ☆25Apr 10, 2025Updated 10 months ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆29Jun 30, 2025Updated 8 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Feb 6, 2026Updated 3 weeks ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated 11 months ago
- a website for accessing many models through api(deepseek、Qwen、Hunyuan etc.)☆17Jul 12, 2025Updated 7 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- [ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts☆265Oct 16, 2024Updated last year
- ☆30Sep 28, 2023Updated 2 years ago
- ☆18Jun 10, 2025Updated 8 months ago
- Code release for AdapMoE accepted by ICCAD 2024☆35Apr 28, 2025Updated 10 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆82Jun 30, 2024Updated last year
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆78Feb 10, 2026Updated 2 weeks ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆40Feb 13, 2025Updated last year
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- Official implementation of paper "VMoBA: Mixture-of-Block Attention for Video Diffusion Models"☆61Jul 1, 2025Updated 8 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Jun 12, 2024Updated last year
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆29Oct 23, 2025Updated 4 months ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- ☆11Jun 22, 2025Updated 8 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 5 months ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs☆30Updated this week