Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
β35Mar 6, 2025Updated last year
Alternatives and similar repositories for CMoE
Users that are interested in CMoE are comparing it to the libraries listed below
Sorting:
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.β16Apr 8, 2025Updated 10 months ago
- [NAACL'25 π SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expertβ¦β15Feb 4, 2025Updated last year
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMsβ33Dec 9, 2025Updated 2 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".β66Aug 15, 2025Updated 6 months ago
- [NeurIPS 2024] Search for Efficient LLMsβ16Jan 16, 2025Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)β29Aug 4, 2024Updated last year
- β19Jan 8, 2025Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ154Jul 9, 2025Updated 7 months ago
- π LLM-I: Transform LLMs into natural interleaved multimodal creators! β¨ Tool-use framework supporting image search, generation, code exβ¦β41Oct 20, 2025Updated 4 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimizationβ24Oct 10, 2025Updated 4 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inferenceβ283May 1, 2025Updated 10 months ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Accelerationβ29Nov 22, 2025Updated 3 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparselyβ24Jun 26, 2024Updated last year
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main conferβ¦β27Mar 29, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costsβ23Nov 11, 2025Updated 3 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Modelsβ60Feb 7, 2025Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.β27Apr 21, 2025Updated 10 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This repβ¦β60Oct 31, 2024Updated last year
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipesβ43Aug 7, 2025Updated 6 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activatedβ33Aug 14, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"β34Jun 13, 2025Updated 8 months ago
- LLM Inference with Microscaling Formatβ34Nov 12, 2024Updated last year
- β32Nov 11, 2024Updated last year
- This is the official repo for the paper "LLM-FE"β59Feb 25, 2026Updated last week
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generationβ34May 28, 2025Updated 9 months ago
- Agentic Learning Powered by AWorldβ90Feb 13, 2026Updated 3 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.β224Jul 25, 2025Updated 7 months ago
- Information relating to topics on Data Engineering, Data Infrastructure, Data Storing, Data Warehouses and Business Analysis. For those iβ¦β10Aug 8, 2021Updated 4 years ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"β30Mar 28, 2024Updated last year
- Repository of IPBenchβ19Jan 4, 2026Updated 2 months ago
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasksβ¦β16Apr 29, 2025Updated 10 months ago
- π LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Trainingβ91Dec 3, 2024Updated last year
- β34Mar 12, 2025Updated 11 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.