PKU-SEC-Lab / AdapMoE
Code release for AdapMoE accepted by ICCAD 2024
☆11Updated 3 months ago
Alternatives and similar repositories for AdapMoE:
Users that are interested in AdapMoE are comparing it to the libraries listed below
- LLM Inference with Microscaling Format☆19Updated 3 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆27Updated this week
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆32Updated 8 months ago
- ☆23Updated 3 months ago
- Quantized Attention on GPU☆34Updated 2 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆26Updated 2 months ago
- ☆52Updated 10 months ago
- SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models