clearml / clearml-fractional-gpuLinks
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆78Updated 9 months ago
Alternatives and similar repositories for clearml-fractional-gpu
Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below
Sorting:
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- ☆215Updated this week
- Inference server benchmarking tool☆67Updated last month
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 10 months ago
- GPU environment and cluster management with LLM support☆607Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆30Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆234Updated this week
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆202Updated last month
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 weeks ago
- ☆60Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆263Updated 7 months ago
- ☆34Updated last week
- Distributed Model Serving Framework☆168Updated 3 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆75Updated 2 weeks ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆57Updated last month
- Self-host LLMs with vLLM and BentoML☆114Updated this week
- Controller for ModelMesh☆230Updated 3 weeks ago
- The Triton backend for the PyTorch TorchScript models.☆149Updated 2 weeks ago
- Run Slurm on Kubernetes. A Slinky project.☆108Updated this week
- Cray-LM unified training and inference stack.☆22Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆127Updated last month
- GPU prices aggregator for cloud providers☆37Updated last week
- ☆18Updated 9 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- ☆260Updated 2 weeks ago
- CUDA checkpoint and restore utility☆341Updated 4 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆208Updated 10 months ago
- ClearML - Model-Serving Orchestration and Repository Solution☆150Updated 4 months ago
- Easy and Efficient Quantization for Transformers☆198Updated 3 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago