LMCache / LMCacheLinks
Supercharge Your LLM with the Fastest KV Cache Layer
☆6,337Updated this week
Alternatives and similar repositories for LMCache
Users that are interested in LMCache are comparing it to the libraries listed below
Sorting:
- A Datacenter Scale Distributed Inference Serving Framework☆5,617Updated this week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,018Updated 2 weeks ago
- Nano vLLM☆9,565Updated last month
- FlashInfer: Kernel Library for LLM Serving☆4,195Updated this week
- Cost-efficient and pluggable Infrastructure components for GenAI inference☆4,458Updated last week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,348Updated last week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,399Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,566Updated 6 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,113Updated last week
- slime is an LLM post-training framework for RL Scaling.☆2,816Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,784Updated this week
- RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal…☆4,995Updated this week
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,158Updated this week
- Post-training with Tinker☆2,357Updated this week
- Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…☆8,014Updated this week
- Fast, Flexible and Portable Structured Generation☆1,418Updated this week
- Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs☆2,081Updated 2 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,077Updated last month
- ☆2,213Updated 2 weeks ago
- The absolute trainer to light up AI agents.☆9,602Updated this week
- A lightweight data processing framework built on DuckDB and 3FS.☆4,860Updated 9 months ago
- Large Language Model (LLM) Systems Paper List☆1,674Updated this week
- My learning notes/codes for ML SYS.☆4,374Updated this week
- Expert Parallelism Load Balancer☆1,316Updated 8 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,013Updated 8 months ago
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆4,791Updated 2 weeks ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,884Updated 9 months ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,981Updated last week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,198Updated 3 months ago
- 📑 PageIndex: Document Index for Reasoning-based RAG☆4,248Updated last week