llm-d / llm-d-inference-simLinks
A light weight vLLM simulator, for mocking out replicas.
☆30Updated this week
Alternatives and similar repositories for llm-d-inference-sim
Users that are interested in llm-d-inference-sim are comparing it to the libraries listed below
Sorting:
- Systematic and comprehensive benchmarks for LLM systems.☆19Updated last week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆156Updated this week
- Artifacts for our NSDI'23 paper TGS☆81Updated last year
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆158Updated last year
- Cloud Native Benchmarking of Foundation Models☆38Updated last month
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆30Updated last week
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆82Updated last year
- NVIDIA NCCL Tests for Distributed Training☆97Updated 2 weeks ago
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆35Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆41Updated last month
- Inference scheduler for llm-d☆61Updated last week
- Serverless Paper Reading and Discussion☆37Updated 2 years ago
- An interference-aware scheduler for fine-grained GPU sharing☆141Updated 5 months ago
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆114Updated last week
- Fast OS-level support for GPU checkpoint and restore☆206Updated 3 weeks ago
- Stateful LLM Serving☆73Updated 4 months ago
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆113Updated 3 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆61Updated last year
- Repository for MLCommons Chakra schema and tools☆113Updated 3 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆459Updated this week
- GPU-scheduler-for-deep-learning☆208Updated 4 years ago
- cricket is a virtualization solution for GPUs☆205Updated last month
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆59Updated last year
- MeshInsight: Dissecting Overheads of Service Mesh Sidecars☆47Updated last year
- Health checks for Azure N- and H-series VMs.☆46Updated last week
- Intercepting CUDA runtime calls with LD_PRELOAD☆40Updated 11 years ago
- Fine-grained GPU sharing primitives☆142Updated 5 years ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆385Updated this week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆123Updated last year
- ☆276Updated last week