coreweave / ml-containers
☆30Updated this week
Alternatives and similar repositories for ml-containers:
Users that are interested in ml-containers are comparing it to the libraries listed below
- Module, Model, and Tensor Serialization/Deserialization☆217Updated 3 weeks ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆89Updated last week
- ☆169Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆120Updated 2 weeks ago
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 2 months ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆23Updated this week
- The driver for LMCache core to run in vLLM☆32Updated last month
- NVIDIA NCCL Tests for Distributed Training☆82Updated this week
- CUDA checkpoint and restore utility☆305Updated last month
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated this week
- Holistic job manager on Kubernetes☆112Updated last year
- OpenVINO backend for Triton.☆31Updated this week
- MLFlow Deployment Plugin for Ray Serve☆44Updated 2 years ago
- ☆93Updated last month
- ☆54Updated 5 months ago
- Benchmark suite for LLMs from Fireworks.ai☆69Updated last month
- Example ML projects that use the Determined library.☆29Updated 6 months ago
- The Triton backend for the PyTorch TorchScript models.☆144Updated last week
- The Triton backend for the ONNX Runtime.☆139Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆63Updated 11 months ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆87Updated this week
- Controller for ModelMesh☆224Updated 2 weeks ago
- Repository for open inference protocol specification☆48Updated 7 months ago
- ☆48Updated 3 months ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆15Updated 4 months ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- MLPerf™ logging library☆33Updated this week
- MIG Partition Editor for NVIDIA GPUs☆189Updated this week