allegroai / clearml-fractional-gpu
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆62Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for clearml-fractional-gpu
- A top-like tool for monitoring GPUs in a cluster☆81Updated 9 months ago
- ☆120Updated this week
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆185Updated 2 months ago
- Helm charts for the KubeRay project☆33Updated last month
- IBM development fork of https://github.com/huggingface/text-generation-inference☆57Updated last month
- Module, Model, and Tensor Serialization/Deserialization☆188Updated last month
- GPU environment and cluster management with LLM support☆495Updated 6 months ago
- Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.☆78Updated this week
- The Triton backend for the PyTorch TorchScript models.☆127Updated this week
- Ray - A curated list of resources: https://github.com/ray-project/ray☆42Updated last year
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆331Updated this week
- Helm chart repository for the new unified way to deploy ClearML on Kubernetes. ClearML - Auto-Magical CI/CD to streamline your AI workloa…☆37Updated last month
- ClearML - Model-Serving Orchestration and Repository Solution☆137Updated 3 months ago
- The Triton backend for the ONNX Runtime.☆132Updated this week
- Google TPU optimizations for transformers models☆75Updated this week
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆57Updated this week
- ☆25Updated this week
- Distributed Model Serving Framework☆154Updated last month
- Self-host LLMs with vLLM and BentoML☆74Updated this week
- ☆200Updated 9 months ago
- In-depth code associated with my Medium blog post, "How to Load PyTorch Models 340 Times Faster with Ray"☆26Updated 2 years ago
- ☆18Updated 2 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆57Updated 4 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆54Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆103Updated last week
- ☆35Updated 2 weeks ago
- Easy and Efficient Quantization for Transformers☆180Updated 4 months ago
- Proxy server for triton gRPC server that inferences embedding model in Rust☆17Updated 3 months ago
- ☆192Updated this week