☆255May 15, 2026Updated this week
Alternatives and similar repositories for FlexKV
Users that are interested in FlexKV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- BZOJ助手☆10May 29, 2019Updated 6 years ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆88Oct 15, 2025Updated 7 months ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆12Nov 8, 2024Updated last year
- Cross-GPU KV Cache Marketplace☆22Nov 12, 2025Updated 6 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆190Feb 11, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆36Apr 29, 2026Updated 3 weeks ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆106Dec 17, 2025Updated 5 months ago
- Prefix-Aware Attention for LLM Decoding☆37Mar 31, 2026Updated last month
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- perf-script and (Linux, QEMU, SeaBIOS) patches to measure the boot time of a Linux VM with QEMU☆41Apr 3, 2020Updated 6 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- alibaba/Sentinel zuul integration sample☆11Oct 20, 2018Updated 7 years ago
- UBio-MolFM is a foundation model suite for molecular modeling, developed by the UBio-MolFM team.☆29Apr 13, 2026Updated last month
- An ultra-fast, distributed Safetensors loader☆51May 5, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 4 years ago
- SocksDirect code repository☆20May 6, 2026Updated last week
- High Performance KV Cache Store for LLM☆53Apr 6, 2026Updated last month
- llama2 inference engine in Rust☆13Apr 12, 2024Updated 2 years ago
- [NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training☆31May 2, 2025Updated last year
- ☆37Dec 9, 2025Updated 5 months ago
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023☆13May 19, 2023Updated 3 years ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning☆10Nov 3, 2024Updated last year
- Persist and reuse KV Cache to speedup your LLM.☆277Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Integrated Training Platform (ITP) traces used in ElasticFlow paper.☆31Dec 23, 2022Updated 3 years ago
- ☆13Jan 30, 2023Updated 3 years ago
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆45May 13, 2025Updated last year
- ☆14Sep 29, 2025Updated 7 months ago
- Terraform module which creates Redis ElastiCache resources on AWS.☆12Dec 9, 2022Updated 3 years ago
- An interactive visualization App aims to help non-exports learn about Recurrent Neural Networks (RNNs)☆12Dec 6, 2022Updated 3 years ago
- Postgres protocol support for finagle☆36Sep 4, 2013Updated 12 years ago
- ☆15Aug 10, 2017Updated 8 years ago
- ☆40Apr 16, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆35May 4, 2026Updated 2 weeks ago
- ☆81Sep 15, 2025Updated 8 months ago
- Code for "HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking"☆96Nov 18, 2025Updated 6 months ago
- 技术杂文集☆34Feb 27, 2026Updated 2 months ago
- Implementation of the Generic Cell Rate Algorithm in C as a Redis Module☆19Jul 15, 2022Updated 3 years ago
- 洛佳的异步内核实验室,第二版☆13Jul 16, 2021Updated 4 years ago
- ☆11Nov 14, 2023Updated 2 years ago