GeeeekExplorer / nano-vllmLinks
Nano vLLM
☆9,694Updated last month
Alternatives and similar repositories for nano-vllm
Users that are interested in nano-vllm are comparing it to the libraries listed below
Sorting:
- Supercharge Your LLM with the Fastest KV Cache Layer☆6,383Updated this week
- FlashInfer: Kernel Library for LLM Serving☆4,285Updated this week
- ☆1,087Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,436Updated last week
- My learning notes for ML SYS.☆4,632Updated last week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,289Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆21,569Updated last week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,050Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆5,673Updated this week
- verl: Volcano Engine Reinforcement Learning for LLMs☆17,582Updated this week
- slime is an LLM post-training framework for RL Scaling.☆2,911Updated this week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,447Updated this week
- Democratizing Reinforcement Learning for LLMs☆4,896Updated this week
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆4,852Updated 3 weeks ago
- An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling…☆8,625Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,795Updated last week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,004Updated this week
- NanoGPT (124M) in 3 minutes☆3,974Updated this week
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆3,562Updated this week