Persist and reuse KV Cache to speedup your LLM.
☆277May 15, 2026Updated this week
Alternatives and similar repositories for unified-cache-management
Users that are interested in unified-cache-management are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2026] Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆33Sep 12, 2025Updated 8 months ago
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated last year
- SocksDirect code repository☆20May 6, 2026Updated 2 weeks ago
- LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs (ASPLOS'20)☆30Oct 3, 2021Updated 4 years ago
- ☆255Updated this week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆36Apr 29, 2026Updated 3 weeks ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆102Dec 2, 2025Updated 5 months ago
- PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.☆193Dec 24, 2025Updated 4 months ago
- KV cache store for distributed LLM inference☆419Nov 13, 2025Updated 6 months ago
- High Performance KV Cache Store for LLM☆53Apr 6, 2026Updated last month
- Distributed systems for fun and profit 的中文翻译☆17Jul 12, 2020Updated 5 years ago
- LMCache on Ascend☆70May 11, 2026Updated last week
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆74Updated this week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆1,039May 13, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆32Mar 5, 2025Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Dec 12, 2023Updated 2 years ago
- 一个可靠、健壮、实时的内存分配器,支持内存冗余,抗单粒子翻转。☆19Apr 24, 2023Updated 3 years ago
- AI大模型的基本开发框架,适合普通后端程序员,功能类似coze包括:fastapi后端接口,搜索,文档解析和向量化,RPA和爬虫,自定义agent,对接第三方数据接口,mongodb数据库,控制json返回,多模态理解和生成等等☆13Jul 18, 2024Updated last year
- ☆12Jan 17, 2024Updated 2 years ago
- ☆25Jan 10, 2023Updated 3 years ago
- ☆16Jul 12, 2024Updated last year
- 【今日头条】文本作者身份识别比赛☆10Aug 20, 2018Updated 7 years ago
- Diagnostic tool used by 42on to gather information from a Ceph cluster☆30Aug 6, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆19May 26, 2023Updated 2 years ago
- HMS - Harmful Brain Activity Classification☆13May 8, 2024Updated 2 years ago
- ☆28May 12, 2026Updated last week
- Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…☆173Updated this week
- Pico is a numpy-based "pico" neural network framework, with torch-like coding style and auto-grad implementation., with MNIST example.☆11Mar 11, 2022Updated 4 years ago
- ☆12Apr 9, 2025Updated last year
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,339Updated this week
- Non-blocking (Asynchronous) MySQL Connector☆17May 20, 2014Updated 11 years ago
- ☆14Jan 20, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A curated list of awesome tools, frameworks, platforms, and resources for building scalable and efficient AI infrastructure, including di…☆58May 11, 2026Updated last week
- ☆15Jan 27, 2026Updated 3 months ago
- Parallel Prefix Sum (Scan) with CUDA☆29Jun 22, 2024Updated last year
- ECIR 2024: Sparse lexical representation for image-text retrieval☆13Jul 8, 2024Updated last year
- a collection of skills for vllm-omni☆64Updated this week
- ☆35May 4, 2026Updated 2 weeks ago
- ☆11Nov 6, 2022Updated 3 years ago