bd-iaas-us / InfiniStore
A distributed KV store for disaggregated LLM inference
☆31Updated this week
Alternatives and similar repositories for InfiniStore:
Users that are interested in InfiniStore are comparing it to the libraries listed below
- Stateful LLM Serving☆46Updated 6 months ago
- ☆36Updated 2 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated 9 months ago
- High performance Transformer implementation in C++.☆103Updated last month
- Vector search with bounded performance.☆34Updated last year
- An interference-aware scheduler for fine-grained GPU sharing☆123Updated 3 weeks ago
- NCCL Profiling Kit☆127Updated 7 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆60Updated 8 months ago
- Fast OS-level support for GPU checkpoint and restore☆153Updated this week
- Microsoft Collective Communication Library☆62Updated 2 months ago
- ☆26Updated last month
- ☆43Updated 7 months ago
- Efficient and easy multi-instance LLM serving☆298Updated this week
- A resilient distributed training framework☆88Updated 10 months ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆50Updated 2 years ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆112Updated 11 months ago
- ☆50Updated 8 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆78Updated 3 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆145Updated 5 months ago
- Ultra | Ultimate | Unified CCL☆32Updated last week
- ☆77Updated last month
- ☆11Updated 8 months ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆17Updated this week
- ☆83Updated 3 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆297Updated this week
- The driver for LMCache core to run in vLLM☆29Updated 2 weeks ago
- Thunder Research Group's Collective Communication Library☆33Updated 9 months ago
- ☆43Updated 3 years ago
- ☆75Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆52Updated 2 weeks ago