allegroai / clearml-fractional-gpu

ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing

☆62

Related projects ⓘ

Alternatives and complementary repositories for clearml-fractional-gpu

run-ai / rntop
A top-like tool for monitoring GPUs in a cluster
☆81Updated 9 months ago
run-ai / runai-model-streamer
☆120Updated this week
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆185Updated 2 months ago
ray-project / kuberay-helm
Helm charts for the KubeRay project
☆33Updated last month
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆57Updated last month
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆188Updated last month
run-ai / genv
GPU environment and cluster management with LLM support
☆495Updated 6 months ago
NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆78Updated this week
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆127Updated this week
JiahaoYao / awesome-ray
Ray - A curated list of resources: https://github.com/ray-project/ray
☆42Updated last year
pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆331Updated this week
allegroai / clearml-helm-charts
Helm chart repository for the new unified way to deploy ClearML on Kubernetes. ClearML - Auto-Magical CI/CD to streamline your AI workloa…
☆37Updated last month
allegroai / clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
☆137Updated 3 months ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆132Updated this week
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆75Updated this week
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆76Updated 2 years ago
NVIDIA / k8s-nim-operator
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆57Updated this week
coreweave / ml-containers
☆25Updated this week
kserve / modelmesh
Distributed Model Serving Framework
☆154Updated last month
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆74Updated this week
Preemo-Inc / text-generation-inference
☆200Updated 9 months ago
project-codeflare / zero-copy-model-loading
In-depth code associated with my Medium blog post, "How to Load PyTorch Models 340 Times Faster with Ray"
☆26Updated 2 years ago
predibase / lora_bakeoff
☆18Updated 2 months ago
aniketmaurya / fastserve-ai
Machine Learning Serving focused on GenAI with simplicity as the top priority.
☆57Updated 4 months ago
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆54Updated 7 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆103Updated last week
apple / ml-hypercloning
☆35Updated 2 weeks ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆180Updated 4 months ago
kozistr / triton-grpc-proxy-rs
Proxy server for triton gRPC server that inferences embedding model in Rust
☆17Updated 3 months ago
triton-inference-server / vllm_backend
☆192Updated this week