MoE-Inf / awesome-moe-inferenceLinks
Curated collection of papers in MoE model inference
☆220Updated this week
Alternatives and similar repositories for awesome-moe-inference
Users that are interested in awesome-moe-inference are comparing it to the libraries listed below
Sorting:
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆264Updated 4 months ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆166Updated last week
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆343Updated 4 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆147Updated last year
- Summary of some awesome work for optimizing LLM inference☆88Updated last month
- This repository is established to store personal notes and annotated papers during daily research.☆138Updated this week
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆517Updated 10 months ago
- paper and its code for AI System☆318Updated 3 months ago
- ☆23Updated last year
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆495Updated last week
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆238Updated last month
- ☆65Updated last year
- ☆123Updated 2 weeks ago
- ☆109Updated 8 months ago
- ☆54Updated last year
- Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…☆115Updated this week
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆47Updated 3 months ago
- Curated collection of papers in machine learning systems☆388Updated last month
- LLM serving cluster simulator☆108Updated last year
- LLM Inference analyzer for different hardware platforms☆80Updated 3 weeks ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆27Updated last year
- High performance Transformer implementation in C++.☆128Updated 6 months ago
- ☆42Updated 11 months ago
- ☆74Updated 3 years ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆307Updated 3 weeks ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆42Updated 7 months ago
- A low-latency & high-throughput serving engine for LLMs☆397Updated 2 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆215Updated 3 weeks ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆216Updated last year
- ☆150Updated last year