clearml / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆70Updated 5 months ago
Alternatives and similar repositories for clearml-fractional-gpu:
Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆193Updated 2 weeks ago
- Distributed Model Serving Framework☆157Updated 3 months ago
- ☆154Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆58Updated last month
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 6 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆88Updated last week
- GPU environment and cluster management with LLM support☆564Updated 8 months ago
- Deep Learning Energy Measurement and Optimization☆229Updated this week
- Controller for ModelMesh☆215Updated 3 weeks ago
- ☆192Updated last week
- Module, Model, and Tensor Serialization/Deserialization☆210Updated 2 months ago
- Google TPU optimizations for transformers models☆90Updated last week
- A collection of all available inference solutions for the LLMs☆76Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆79Updated 2 weeks ago
- A top-like tool for monitoring GPUs in a cluster☆84Updated 11 months ago
- ☆52Updated 3 weeks ago
- vLLM adapter for a TGIS-compatible gRPC server.☆17Updated this week
- ClearML - Model-Serving Orchestration and Repository Solution☆141Updated 2 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆110Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆257Updated 3 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆48Updated this week
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆25Updated 2 weeks ago
- PyTorch per step fault tolerance (actively under development)☆226Updated this week
- The Triton backend for the PyTorch TorchScript models.☆141Updated last week
- aim-mlflow integration☆197Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆187Updated 5 months ago
- ☆218Updated this week
- Easy and Efficient Quantization for Transformers☆192Updated last month