llm-d / llm-d-inference-simLinks
A light weight vLLM simulator, for mocking out replicas.
☆24Updated this week
Alternatives and similar repositories for llm-d-inference-sim
Users that are interested in llm-d-inference-sim are comparing it to the libraries listed below
Sorting:
- Inference scheduler for llm-d☆56Updated this week
- Systematic and comprehensive benchmarks for LLM systems.☆15Updated 2 weeks ago
- A tool for coordinated checkpoint/restore of distributed applications with CRIU☆23Updated 3 weeks ago
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆30Updated 5 months ago
- Distributed KV cache coordinator☆35Updated this week
- Cloud Native Benchmarking of Foundation Models☆36Updated last week
- A tool to detect infrastructure issues on cloud native AI systems☆39Updated last month
- ☆30Updated last month
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆98Updated 2 months ago
- ☆26Updated 3 months ago
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆67Updated last month
- Serverless Paper Reading and Discussion☆37Updated 2 years ago
- Simplified model deployment on llm-d☆24Updated 2 weeks ago
- Example DRA driver that developers can fork and modify to get them started writing their own.☆74Updated last month
- An efficient GPU resource sharing system with fine-grained control for Linux platforms.☆83Updated last year
- Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)☆58Updated 3 weeks ago
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆63Updated 4 months ago
- Holistic job manager on Kubernetes☆116Updated last year
- FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)☆55Updated 3 years ago
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆34Updated last year
- Enabling Kubernetes to make pod placement decisions with platform intelligence.☆175Updated 4 months ago
- GenAI inference performance benchmarking tool☆58Updated this week
- ☆11Updated 2 months ago
- Intelligent platform for AI workloads☆37Updated 2 years ago
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆113Updated this week
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- The schedule of the seminar☆25Updated 3 years ago
- Artifacts for our NSDI'23 paper TGS☆76Updated last year
- ☆28Updated 3 years ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆61Updated last year