coreweave / ml-containers
☆30Updated 2 weeks ago
Alternatives and similar repositories for ml-containers:
Users that are interested in ml-containers are comparing it to the libraries listed below
- Module, Model, and Tensor Serialization/Deserialization☆221Updated last month
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- ☆190Updated last week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- The driver for LMCache core to run in vLLM☆36Updated 2 months ago
- NVIDIA NCCL Tests for Distributed Training☆88Updated this week
- High-performance safetensors model loader☆19Updated this week
- ☆49Updated 4 months ago
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆124Updated last week
- ☆301Updated 7 months ago
- Pygloo provides Python bindings for Gloo.☆21Updated last month
- ☆54Updated 6 months ago
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆330Updated this week
- Cloud Native Benchmarking of Foundation Models☆30Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- Simple dependency injection framework for Python☆20Updated 10 months ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆92Updated this week
- Efficient and easy multi-instance LLM serving☆367Updated this week
- CUDA checkpoint and restore utility☆322Updated 2 months ago
- A low-latency & high-throughput serving engine for LLMs☆337Updated 2 months ago
- ☆66Updated 2 weeks ago
- ☆12Updated last year
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆92Updated this week
- SGLang is fast serving framework for large language models and vision language models.☆22Updated 2 months ago
- benchmarking some transformer deployments☆26Updated 2 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.☆104Updated this week
- Holistic job manager on Kubernetes☆114Updated last year