coreweave / ml-containersLinks

☆42

Alternatives and similar repositories for ml-containers

Users that are interested in ml-containers are comparing it to the libraries listed below

Sorting:

IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆276Updated 3 months ago
run-ai / runai-model-streamer
☆267Updated last week
foundation-model-stack / fastsafetensors
High-performance safetensors model loader
☆76Updated 2 weeks ago
NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆113Updated last week
simon-mo / vLLM-Benchmark
☆31Updated 7 months ago
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆38Updated 7 months ago
llm-d / llm-d-deployer
Helm charts for llm-d
☆50Updated 4 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated last week
run-ai / rntop
A top-like tool for monitoring GPUs in a cluster
☆85Updated last year
kserve / open-inference-protocol
Repository for open inference protocol specification
☆60Updated 6 months ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 10 months ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
tensorchord / inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
☆28Updated 2 years ago
NVIDIA / cuda-checkpoint
CUDA checkpoint and restore utility
☆393Updated 2 months ago
ray-project / mlflow-ray-serve
MLFlow Deployment Plugin for Ray Serve
☆46Updated 3 years ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆234Updated last week
bentoml / simple_di
Simple dependency injection framework for Python
☆21Updated last year
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆322Updated this week
imbue-ai / cluster-health
☆316Updated last year
AI-Hypercomputer / xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…
☆154Updated this week
mlcommons / logging
MLPerf™ logging library
☆37Updated last month
ryantd / veloce
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆17Updated 3 years ago
huggingface / xet-core
xet client tech, used in huggingface_hub
☆340Updated this week
ailzhang / minPP
Pipeline parallelism for the minimalist
☆37Updated 3 months ago
NVIDIA / gpu-driver-container
The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.
☆145Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆126Updated 3 weeks ago