clearml / clearml-fractional-gpuLinks

ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing

☆80

Alternatives and similar repositories for clearml-fractional-gpu

Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below

Sorting:

coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆267Updated last month
run-ai / runai-model-streamer
☆255Updated last week
run-ai / genv
GPU environment and cluster management with LLM support
☆642Updated last year
run-ai / rntop
A top-like tool for monitoring GPUs in a cluster
☆85Updated last year
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆150Updated 2 weeks ago
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆61Updated 3 weeks ago
substratusai / vllm-docker
☆64Updated 6 months ago
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆211Updated 5 months ago
huggingface / inference-benchmarker
Inference server benchmarking tool
☆114Updated last week
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆215Updated last year
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆138Updated last year
NVIDIA / ais-k8s
Kubernetes Operator, ansible playbooks, and production scripts for large-scale AIStore deployments on Kubernetes.
☆110Updated last week
huggingface / gpu-fryer
Where GPUs get cooked 👩‍🍳🔥
☆293Updated 3 weeks ago
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆197Updated this week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆132Updated 3 weeks ago
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆41Updated this week
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆120Updated 8 months ago
autonomi-ai / nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
☆145Updated last year
lambdal / lambda-stack-dockerfiles
☆278Updated 7 months ago
kserve / modelmesh-serving
Controller for ModelMesh
☆237Updated 4 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆83Updated last week
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 7 months ago
Preemo-Inc / text-generation-inference
☆197Updated last year
clearml / clearml-serving
ClearML - Model-Serving Orchestration and Repository Solution
☆157Updated last week
triton-inference-server / vllm_backend
☆300Updated this week
Snowflake-Labs / vllm
☆15Updated last month
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆265Updated last year
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆90Updated this week
JiahaoYao / awesome-ray
Ray - A curated list of resources: https://github.com/ray-project/ray
☆69Updated 3 months ago
kserve / modelmesh
Distributed Model Serving Framework
☆178Updated last week