llm-d / llm-d-inference-simLinks

A light weight vLLM simulator, for mocking out replicas.

☆30

Alternatives and similar repositories for llm-d-inference-sim

Users that are interested in llm-d-inference-sim are comparing it to the libraries listed below

Sorting:

sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆192Updated last week
LMCache / LMBenchmark
Systematic and comprehensive benchmarks for LLM systems.
☆21Updated last month
fmperf-project / fmperf
Cloud Native Benchmarking of Foundation Models
☆39Updated last month
Jeffwan / serverless-research
Serverless Paper Reading and Discussion
☆37Updated 2 years ago
heyfey / vodascheduler
GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)
☆35Updated last year
BaizeAI / kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆31Updated last week
IBM / autopilot
A tool to detect infrastructure issues on cloud native AI systems
☆44Updated last week
llm-d / llm-d-inference-scheduler
Inference scheduler for llm-d
☆68Updated this week
SJTU-IPADS / PhoenixOS
Fast OS-level support for GPU checkpoint and restore
☆224Updated this week
NTHU-LSALAB / Gemini
An efficient GPU resource sharing system with fine-grained control for Linux platforms.
☆84Updated last year
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆100Updated last week
Bruce-Lee-LY / cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
☆160Updated last year
pkusys / TGS
Artifacts for our NSDI'23 paper TGS
☆81Updated last year
intel / memtierd
☆31Updated 3 months ago
GPUprobe / gpuprobe-daemon
Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes
☆117Updated 4 months ago
ds2-lab / FaaSNet
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute (USENIX ATC'21)
☆55Updated 3 years ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆142Updated 6 months ago
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆491Updated this week
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆115Updated this week
NVIDIA / cuda-checkpoint
CUDA checkpoint and restore utility
☆353Updated 6 months ago
NVIDIA / go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
☆116Updated last month
NVIDIA / topograph
A toolkit for discovering cluster network topology.
☆59Updated last week
UWNetworksLab / meshinsight
MeshInsight: Dissecting Overheads of Service Mesh Sidecars
☆47Updated last year
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆124Updated last year
spcl / rFaaS
rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.
☆52Updated 3 weeks ago
All-less / faas-scheduling-benchmark
A benchmark suite for evaluating FaaS scheduler.
☆23Updated 2 years ago
WukLab / preble
Stateful LLM Serving
☆77Updated 4 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
ProjectMitosisOS / mitosis-core
An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).
☆63Updated 5 months ago
llm-d / llm-d-kv-cache-manager
Distributed KV cache coordinator
☆43Updated this week