Persist and reuse KV Cache to speedup your LLM.
☆271Apr 20, 2026Updated this week
Alternatives and similar repositories for unified-cache-management
Users that are interested in unified-cache-management are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning☆31Sep 12, 2025Updated 7 months ago
- ONCache: A Cache-Based Low-Overhead Container Overlay Network☆21Jun 7, 2025Updated 10 months ago
- StoneNeedle is a tool, which runs in the Linux kernel environment (later than v3.13), and statistic the I/O workload profiling data. It w…☆20Apr 7, 2023Updated 3 years ago
- ☆227Updated this week
- LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs (ASPLOS'20)☆30Oct 3, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆33Apr 13, 2026Updated last week
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆98Dec 2, 2025Updated 4 months ago
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆45Updated this week
- PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.☆173Dec 24, 2025Updated 3 months ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆859Apr 7, 2026Updated last week
- KV cache store for distributed LLM inference☆405Nov 13, 2025Updated 5 months ago
- High Performance KV Cache Store for LLM☆53Apr 6, 2026Updated 2 weeks ago
- LMCache on Ascend☆61Apr 9, 2026Updated last week
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Dec 12, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 一个可靠、健壮、实时的内存分配器,支持内存冗余,抗单粒子翻转。☆19Apr 24, 2023Updated 2 years ago
- AI大模型的基本开发框架,适合普通后端程序员,功能类似coze包括:fastapi后端接口,搜索,文档解析和向量化,RPA和爬虫,自定义agent,对接第三方数据接口,mongodb数据库,控制json返回,多模态理解和生成等等☆13Jul 18, 2024Updated last year
- Learn how to create impactful AI Agents using Agno AI Python Package☆13Jul 31, 2025Updated 8 months ago
- Autonomous Agent for Kubernetes☆14Feb 14, 2025Updated last year
- ANOLISA - Agentic Nexus Operating Layer & Interface System Architecture☆159Updated this week
- ☆23Jan 10, 2023Updated 3 years ago
- ☆16Jul 12, 2024Updated last year
- ☆17May 26, 2023Updated 2 years ago
- 【今日头条】文本作者身份识别比赛☆10Aug 20, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implement FlashAttention v2 with minimal code to learn.☆16Jun 12, 2024Updated last year
- A curated list of awesome tools, frameworks, platforms, and resources for building scalable and efficient AI infrastructure, including di…☆50Mar 3, 2026Updated last month
- HMS - Harmful Brain Activity Classification☆13May 8, 2024Updated last year
- Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…☆137Apr 14, 2026Updated last week
- a collection of skills for vllm-omni☆53Apr 14, 2026Updated last week
- eTran: Extensible Kernel Transport with eBPF☆41Apr 28, 2025Updated 11 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆5,122Updated this week
- ☆12Apr 9, 2025Updated last year
- ☆14Jan 20, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- SC 2021, "LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging"☆12Jul 12, 2021Updated 4 years ago
- ☆27Apr 8, 2026Updated last week
- ☆15Jan 27, 2026Updated 2 months ago
- Storm Elastic Search Bolt☆63Dec 17, 2023Updated 2 years ago
- Parallel Prefix Sum (Scan) with CUDA☆29Jun 22, 2024Updated last year
- ECIR 2024: Sparse lexical representation for image-text retrieval☆13Jul 8, 2024Updated last year
- A vision-based RL environment for the Franka Panda arm using NVIDIA Isaac Sim☆18Jan 3, 2025Updated last year