xlite-dev / LLM-InfraLinks

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

☆4,142

Alternatives and similar repositories for LLM-Infra

Users that are interested in LLM-Infra are comparing it to the libraries listed below

Sorting:

zhaochenyang20 / Awesome-ML-SYS-Tutorial
My learning notes/codes for ML SYS.
☆2,498Updated this week
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,211Updated this week
horseee / Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
☆1,736Updated this week
OpenRLHF / OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Asy…
☆7,145Updated this week
HuangOwen / Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
☆1,567Updated last week
AIoT-MLSys-Lab / Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
☆1,172Updated last week
xlite-dev / LeetCUDA
📚LeetCUDA: 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.
☆4,789Updated this week
ModelTC / lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆3,317Updated this week
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,081Updated last week
kvcache-ai / Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆3,436Updated this week
SafeAILab / EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
☆1,325Updated last week
AmberLJC / LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
☆1,318Updated this week
InternLM / lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆6,563Updated this week
volcengine / verl
verl: Volcano Engine Reinforcement Learning for LLMs
☆9,710Updated this week
Xnhyacinth / Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
☆1,531Updated last week
hemingkx / SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆800Updated this week
FasterDecoding / Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,549Updated 11 months ago
LMCache / LMCache
Redis for LLMs
☆1,560Updated this week
BBuf / how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
☆2,269Updated this week
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,193Updated last month
deepspeedai / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,088Updated 2 months ago
open-compass / opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆5,541Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆15,276Updated this week
GeeeekExplorer / nano-vllm
Nano vLLM
☆1,659Updated this week
AmadeusChan / Awesome-LLM-System-Papers
☆595Updated last month
feifeibear / LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
☆762Updated 10 months ago
alibaba / Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,151Updated last week
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆1,518Updated this week
openreasoner / openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,788Updated 5 months ago
fla-org / flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
☆2,753Updated this week