JarvisPei / CMoEView external linksLinks
Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
☆34Mar 6, 2025Updated 11 months ago
Alternatives and similar repositories for CMoE
Users that are interested in CMoE are comparing it to the libraries listed below
Sorting:
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 10 months ago
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs☆33Dec 9, 2025Updated 2 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- [NeurIPS 2024] Search for Efficient LLMs☆16Jan 16, 2025Updated last year
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆153Jul 9, 2025Updated 7 months ago
- ☆19Jan 8, 2025Updated last year
- 🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code ex…☆41Oct 20, 2025Updated 3 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆24Oct 10, 2025Updated 4 months ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆27Jul 15, 2025Updated 6 months ago
- ☆22Jun 10, 2025Updated 8 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆283May 1, 2025Updated 9 months ago
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 3 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated last year
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Jun 26, 2024Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Feb 7, 2025Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 9 months ago
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆83Sep 5, 2025Updated 5 months ago
- Kaggle AIMO2 solution with token-efficient reasoning LLM recipes☆42Aug 7, 2025Updated 6 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆18Nov 18, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆35Jun 13, 2025Updated 8 months ago
- ☆31Nov 11, 2024Updated last year
- Agentic Learning Powered by AWorld☆88Updated this week
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆223Jul 25, 2025Updated 6 months ago
- Repository of IPBench☆19Jan 4, 2026Updated last month
- CRAI is a multimodal large language model based on the Mixture of Experts (MoE) architecture, supporting text and image cross-modal tasks…☆16Apr 29, 2025Updated 9 months ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆91Dec 3, 2024Updated last year
- ☆36Mar 12, 2025Updated 11 months ago
- [CVPR'25] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆47Jul 22, 2025Updated 6 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆47Jul 17, 2025Updated 6 months ago
- DPO, but faster 🚀☆47Dec 6, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 2 months ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆98Nov 25, 2024Updated last year
- GBM implementation on Legate☆14Jan 28, 2026Updated 2 weeks ago
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆26Feb 4, 2026Updated last week
- [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"☆23Oct 23, 2025Updated 3 months ago
- ☆11Jul 17, 2023Updated 2 years ago