llm-d / llm-d-inference-simLinks
A light weight vLLM simulator, for mocking out replicas.
☆40Updated last week
Alternatives and similar repositories for llm-d-inference-sim
Users that are interested in llm-d-inference-sim are comparing it to the libraries listed below
Sorting:
- Systematic and comprehensive benchmarks for LLM systems.☆30Updated last month
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆261Updated this week
- Cloud Native Benchmarking of Foundation Models☆41Updated last month
- A tool to detect infrastructure issues on cloud native AI systems☆47Updated last month
- Inference scheduler for llm-d☆86Updated last week
- Artifacts for our NSDI'23 paper TGS☆84Updated last year
- NVIDIA NCCL Tests for Distributed Training☆110Updated last week
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆85Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆165Updated last year
- Fast OS-level support for GPU checkpoint and restore☆233Updated 3 weeks ago
- An interference-aware scheduler for fine-grained GPU sharing☆145Updated 7 months ago
- Ultra and Unified CCL☆530Updated this week
- Repository for MLCommons Chakra schema and tools☆125Updated last month
- Distributed KV cache coordinator☆66Updated this week
- Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…☆120Updated 3 weeks ago
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆128Updated 5 months ago
- Serverless Paper Reading and Discussion☆37Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆62Updated last year
- NVIDIA Inference Xfer Library (NIXL)☆603Updated this week
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆62Updated last year
- Stateful LLM Serving☆81Updated 6 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆128Updated last year
- DeepSeek-V3/R1 inference performance simulator☆165Updated 5 months ago
- NCCL Profiling Kit☆143Updated last year
- rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.☆53Updated 2 months ago
- ☆13Updated 4 months ago
- Microsoft Collective Communication Library☆66Updated 9 months ago
- ☆84Updated 5 months ago
- KV cache store for distributed LLM inference☆326Updated 3 months ago
- Efficient and easy multi-instance LLM serving☆480Updated last week