clearml / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆73Updated 7 months ago
Alternatives and similar repositories for clearml-fractional-gpu:
Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below
- ☆169Updated this week
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 7 months ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆196Updated last month
- vLLM adapter for a TGIS-compatible gRPC server.☆23Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated 2 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆59Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆200Updated 7 months ago
- Self-host LLMs with vLLM and BentoML☆92Updated this week
- The Triton backend for the PyTorch TorchScript models.☆144Updated this week
- Google TPU optimizations for transformers models☆102Updated last month
- Repository for open inference protocol specification☆48Updated 7 months ago
- ☆54Updated 2 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆51Updated last month
- FIL backend for the Triton Inference Server☆76Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆260Updated 5 months ago
- Module, Model, and Tensor Serialization/Deserialization☆216Updated 2 weeks ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 2 months ago
- ☆235Updated this week
- The backend behind the LLM-Perf Leaderboard☆10Updated 10 months ago
- ☆295Updated 6 months ago
- Benchmarking the serving capabilities of vLLM☆33Updated 6 months ago
- experiments with inference on llama☆104Updated 9 months ago
- ClearML - Model-Serving Orchestration and Repository Solution☆146Updated last month
- A tool to configure, launch and manage your machine learning experiments.☆123Updated this week
- Distributed Model Serving Framework☆158Updated 2 weeks ago
- The Triton backend for TensorRT.☆70Updated 3 weeks ago
- User documentation for KServe.☆104Updated this week
- The Triton backend for the ONNX Runtime.☆139Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆63Updated 11 months ago