Persist and reuse KV Cache to speedup your LLM.
☆288Jun 9, 2026Updated last week
Alternatives and similar repositories for unified-cache-management
Users that are interested in unified-cache-management are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆32May 2, 2025Updated last year
- ONCache: A Cache-Based Low-Overhead Container Overlay Network☆21Jun 7, 2025Updated last year
- SocksDirect code repository☆20May 6, 2026Updated last month
- [ICML 2026]A framework to compare low-bit integer and float-point formats☆80May 6, 2026Updated last month
- StoneNeedle is a tool, which runs in the Linux kernel environment (later than v3.13), and statistic the I/O workload profiling data. It w…☆21Apr 7, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs (ASPLOS'20)☆30Oct 3, 2021Updated 4 years ago
- ☆282Jun 9, 2026Updated last week
- 国内首个企业级 IT 运维多 Agent 自动化平台 — 基于大语言模型的智能运维解决方案。ITOps Agent Platform 是一个企业级全栈运维自动化平台,通过可视化工作流编排,将多个AI Agent组合成智能运维自动化流水线,实现服务器管理、告警处理、故障诊断、…☆192Updated this week
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆36Apr 29, 2026Updated last month
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆109Dec 2, 2025Updated 6 months ago
- multi-streamed F2FS: An NVMe ZNS SSD optimized F2FS File System with concurrently writable hot/warm/cold data streams and application-gui…☆25Mar 16, 2023Updated 3 years ago
- Compact and Agent-Native MoE Training System☆189Jun 9, 2026Updated last week
- ☆19Sep 30, 2022Updated 3 years ago
- PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.☆207Dec 24, 2025Updated 5 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- KV cache store for distributed LLM inference☆421Nov 13, 2025Updated 7 months ago
- High Performance KV Cache Store for LLM☆56May 20, 2026Updated 3 weeks ago
- Distributed systems for fun and profit 的中文翻译☆17Jul 12, 2020Updated 5 years ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆1,070Updated this week
- ☆32Mar 5, 2025Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Dec 12, 2023Updated 2 years ago
- 一个可靠、健壮、实时的内存分配器,支持内存冗余,抗单粒子翻转。☆19Apr 24, 2023Updated 3 years ago
- ☆13Jan 17, 2024Updated 2 years ago
- Autonomous Agent for Kubernetes☆15Feb 14, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆25Jan 10, 2023Updated 3 years ago
- ☆16Jul 12, 2024Updated last year
- BH hackathon☆14Apr 4, 2024Updated 2 years ago
- ☆31Jun 4, 2026Updated last week
- ☆13Apr 9, 2025Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,569Updated this week
- ☆14Jan 20, 2025Updated last year
- A curated list of awesome tools, frameworks, platforms, and resources for building scalable and efficient AI infrastructure, including di…☆59May 11, 2026Updated last month
- SC 2021, "LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging"☆12Jul 12, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…☆189Updated this week
- ☆15Jan 27, 2026Updated 4 months ago
- Parallel Prefix Sum (Scan) with CUDA☆29Jun 22, 2024Updated last year
- Storm Elastic Search Bolt☆63Dec 17, 2023Updated 2 years ago
- ☆84Sep 15, 2025Updated 9 months ago
- ☆37May 19, 2026Updated 3 weeks ago
- a collection of skills for vllm-omni☆76Updated this week