☆227Apr 17, 2026Updated this week
Alternatives and similar repositories for FlexKV
Users that are interested in FlexKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open-source implementation of Google's TurboQuant (ICLR 2026) — KV cache compression to 2.5–4 bits with near-zero quality loss. 3.8–5.7x …☆48Mar 29, 2026Updated 3 weeks ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆82Oct 15, 2025Updated 6 months ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆12Nov 8, 2024Updated last year
- ☆27Jun 22, 2025Updated 9 months ago
- Cross-GPU KV Cache Marketplace☆22Nov 12, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A simple demo for using Sentinel with Spring Cloud Alibaba☆16Nov 8, 2018Updated 7 years ago
- An ultra-fast, distributed Safetensors loader☆40Apr 8, 2026Updated last week
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆177Feb 11, 2026Updated 2 months ago
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆33Apr 13, 2026Updated last week
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆105Dec 17, 2025Updated 4 months ago
- Prefix-Aware Attention for LLM Decoding☆35Mar 31, 2026Updated 3 weeks ago
- Important experiments on memory management, file access, network transfer, job scheduler, and so on.☆15Apr 27, 2022Updated 3 years ago
- A minimal content focused markdown sveltekit template.☆16Jul 15, 2025Updated 9 months ago
- UBio-MolFM is a foundation model suite for molecular modeling, developed by the UBio-MolFM team.☆22Apr 13, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- perf-script and (Linux, QEMU, SeaBIOS) patches to measure the boot time of a Linux VM with QEMU☆40Apr 3, 2020Updated 6 years ago
- 🍨 Gelato — From Data Curation to Reinforcement Learning: Building a Strong Grounding Model for Computer-Use Agents☆44Dec 22, 2025Updated 3 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- alibaba/Sentinel zuul integration sample☆11Oct 20, 2018Updated 7 years ago
- ☆33Nov 18, 2025Updated 5 months ago
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 4 years ago
- Official Implementation of "The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thou…☆14Jul 2, 2025Updated 9 months ago
- High Performance KV Cache Store for LLM☆53Apr 6, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated 11 months ago
- ☆36Dec 9, 2025Updated 4 months ago
- Heartland Payment Systems Java SDK☆10Feb 27, 2025Updated last year
- COSCon Workshop on ECharts☆18Oct 18, 2018Updated 7 years ago
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023☆13May 19, 2023Updated 2 years ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning☆10Nov 3, 2024Updated last year
- Persist and reuse KV Cache to speedup your LLM.☆271Updated this week
- A platform for formalizing OEIS sequences in Lean 4☆19Updated this week
- A "standard library" of Triton kernels.☆22Oct 2, 2025Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Terraform module which creates Redis ElastiCache resources on AWS.☆12Dec 9, 2022Updated 3 years ago
- eTran: Extensible Kernel Transport with eBPF☆41Apr 28, 2025Updated 11 months ago
- ☆24Mar 3, 2026Updated last month
- a libp2p-backed daemon wrapping the functionalities of go-libp2p for use in other languages☆11Feb 9, 2025Updated last year
- ☆34Mar 23, 2026Updated 3 weeks ago
- Spring Cloud Alibaba, Dubbo, Alibaba Cloud, and more.☆33Nov 16, 2018Updated 7 years ago
- ☆38Updated this week