taco-project / FlexKVLinks
☆94Updated this week
Alternatives and similar repositories for FlexKV
Users that are interested in FlexKV are comparing it to the libraries listed below
Sorting:
- GLake: optimizing GPU memory management and IO transmission.☆486Updated 7 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆709Updated 6 months ago
- Efficient and easy multi-instance LLM serving☆504Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆688Updated this week
- A PyTorch Native LLM Training Framework☆879Updated last month
- KV cache store for distributed LLM inference☆346Updated last month
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆903Updated last week
- Microsoft Collective Communication Library☆368Updated 2 years ago
- ☆309Updated last month
- ☆508Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,153Updated 2 months ago
- Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation☆37Updated last month
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆38Updated last year
- NCCL Tests☆1,313Updated this week
- DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
- High performance Transformer implementation in C++.☆139Updated 9 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,206Updated 2 weeks ago
- Zero Bubble Pipeline Parallelism☆433Updated 5 months ago
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆269Updated 2 years ago
- Curated collection of papers in machine learning systems☆441Updated 3 weeks ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆568Updated last year
- A tool for bandwidth measurements on NVIDIA GPUs.☆553Updated 6 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆427Updated this week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆745Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆432Updated 5 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆451Updated this week
- RDMA and SHARP plugins for nccl library☆211Updated last week
- Materials for learning SGLang☆626Updated last week
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆66Updated last year
- This repository is established to store personal notes and annotated papers during daily research.☆155Updated last month