clearml / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆77Updated 9 months ago
Alternatives and similar repositories for clearml-fractional-gpu:
Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below
- A top-like tool for monitoring GPUs in a cluster☆86Updated last year
- ☆205Updated last month
- GPU environment and cluster management with LLM support☆604Updated 11 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆57Updated 3 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆26Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆225Updated 2 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 9 months ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆199Updated 2 weeks ago
- Self-host LLMs with vLLM and BentoML☆107Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 9 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated 3 weeks ago
- Repository for open inference protocol specification☆54Updated 9 months ago
- ☆30Updated last week
- 👷 Build compute kernels☆37Updated this week
- Easy and Efficient Quantization for Transformers☆197Updated 2 months ago
- A tool to configure, launch and manage your machine learning experiments.☆144Updated this week
- Inference server benchmarking tool☆56Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆62Updated last week
- ☆115Updated 3 weeks ago
- Transformer GPU VRAM estimator☆59Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 4 months ago
- Helm charts for the KubeRay project☆43Updated 3 weeks ago
- ClearML - Model-Serving Orchestration and Repository Solution☆150Updated 3 months ago
- ☆304Updated 8 months ago
- Cray-LM unified training and inference stack.☆22Updated 3 months ago
- ☆209Updated 3 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 7 months ago
- ☆28Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week