clearml / clearml-fractional-gpuLinks
ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing
☆80Updated last year
Alternatives and similar repositories for clearml-fractional-gpu
Users that are interested in clearml-fractional-gpu are comparing it to the libraries listed below
Sorting:
- ☆249Updated this week
- Self-host LLMs with vLLM and BentoML☆149Updated last week
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- GPU environment and cluster management with LLM support☆639Updated last year
- Module, Model, and Tensor Serialization/Deserialization☆265Updated last month
- ☆64Updated 5 months ago
- Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.☆211Updated 5 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆61Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆138Updated last year
- The backend behind the LLM-Perf Leaderboard☆10Updated last year
- Inference server benchmarking tool☆100Updated 4 months ago
- Repository for open inference protocol specification☆59Updated 4 months ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated 2 weeks ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- ☆39Updated this week
- ☆296Updated last week
- Where GPUs get cooked 👩🍳🔥☆282Updated 2 weeks ago
- Benchmarking the serving capabilities of vLLM☆51Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated 11 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- ☆199Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated 2 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆40Updated this week
- Cray-LM unified training and inference stack.☆22Updated 7 months ago
- Route LLM requests to the best model for the task at hand.☆107Updated last week
- experiments with inference on llama☆104Updated last year
- Google TPU optimizations for transformers models☆120Updated 8 months ago
- ☆58Updated 3 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆214Updated last year