UNITES-Lab / HEXA-MoELinks
Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"
β15Updated 10 months ago
Alternatives and similar repositories for HEXA-MoE
Users that are interested in HEXA-MoE are comparing it to the libraries listed below
Sorting:
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ65Updated 6 months ago
- [NAACL'25 π SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expertβ¦β13Updated 11 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inferenceβ47Updated last year
- β39Updated 5 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ121Updated last month
- [Archived] For the latest updates and community contribution, please visit: https://gitcode.com/Ascend/TransferQueueβ12Updated this week
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Modelsβ25Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding